ROUGH DRAFT authorea.com/93829
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • PyMKS Package Paper

    Introduction

    • Current practices for developing Materials Data Analytics Toolsets are highly localized within a few individual groups resulting in major inefficiency (unnecessary duplication of codes, inadequate verification and validation of multiple instantiations of code, not engaging the right talent for the right task, etc.)
    • Community development and sharing of code repositories has been successful in certain science communities. Advantages of this approach include increased e-teaming and e-collaborations, vastly improved code hygiene, promotion of open science, rapid verification and validation, and a dramatic increase in overall productivity. All of these are key ingredients for realizing a major bump (scale-up) in innovation in sciences and technology

    - Community development of a materials data analytics toolset can significantly change the landscape of materials innovation through the emergence and adoption of the cross-disciplinary field of Materials Informatics; help realize the vision outlined in MGI and ICME. This is probably the only practical way to get materials scientists and computer scientists to establish meaningful, mutually beneficial, and highly productive collaborations. Current efforts in materials science are:

    DFT/First Priniciples - not sure if it makes sense to break it down by type of calculations

    • The Materials Project is currently powered by pyMatgen (Ong 2013) - Integrated tools in python from first principles (DFT and MD) for thermodynamic and electrical quantities with applications in functional materials.
    • Open Quantum Materials Database (OQMD) (Saal 2013)- Open source data repository for phase diagrams and electronic ground states computing using DFT. OQMD also offers visualization tools and a Python API.
    • OpenKIM for atomic potentials
    • Citrine ?

    All of the above are essentilaly data/information repositories. Maybe we can break this down as Data/Information Repositories, Code Repositories, and e-collaboration platforms. Perhaps not talk too much about the first and last - but just mention some names and provide some links and focus more of your review and paper on the second one?

    ICME

    - Questeck ? - I do not think they do anything in the "open" arena

    In terms of code repositories - I think there are also codes dedicated to classes of problems. Would LAMMPS fall into this category? How about SPARRKS and MOOSE. DREAM-3D claims to be a code repository as well. In Europe there in DAMASK from Max Planck in Dusseldorf.

    Data Storage

    • NIST Data Gateway http://srdata.nist.gov/gateway/gateway?dblist=0 - Over 100 free and paid querirbale on-line materials databases. Some examples of data are atomic structure, thermodynamics, kinetics, fundamental physical constants, x-ray spectroscopy, and more.
    • NIST D Space -
    • NIST Materials Data Curation Systems (MDCS) - General nline databased for materials data
    • Computational Materials Data Network (CMD Network) http://www.asminternational.org/web/cmdnetwork/home - Founded by ASM International, CMD Network is a data focus project that facilitates data warehousing, sharing and collaborations.
    • Granta ?
    • MatWeb http://www.matweb.com/ - Database containing materials properties for over 100,000 materials.
    • Materials Atlas (link not working) - Database storing 3D results from experiments and simulations.

    Structure Tools

    • Dream3D - Software tool that does synthetic microstructures generation, image processing and mesh creation for finite element.
    • PyMKS aims to seed and nurture an emergent user group in the materials data analytics for establishing homogenization and localization linkages by leveraging open source scientific and machine learning packages in Python. The approach used to develop PyMKS as well as several examples are presented. This paper is a call to others interested in participating in this open science activity.

    Development Approach

    This section outlines our approach to development and also how we hope to engage the wider community in its use. None of this stuff should be specific to MKS

    • Use abstractions from other libraries, don't invent our own
    • Use open source dependencies
    • Have a permissive licence
    • Have interactive online notebooks (**)
    • Use VMs with the full stack (**)
    • Test suite integrated with the documentation and the examples (**)
    • Use Python due to its large scientific code base - high lever etc adoption in machine learning
    • Use continuous integration tool (**)
    • integration of docs and tests
    • work with MGI practioner's (I'm doing this with Shengyen) code bases and integrate parts into MKS (**)
    • (**) Need to be addressed and we should address before publication

    test edit

    Theory

    Very short section on theory primarily referencing other papers.

    • Explain why homogenization and localization are needed for materials design
    • Digital Microstructure function
    • Basis selection for local states
    • Structure Quantification with 2-point statistics
    • Homogenization (2-Point Statistics Equation)
      • Workflow: Spatial Correlatiions -> Dimensionality Reduction -> Calibration -> Prediction
    • Localization (MKS equation) with consistent notation to the online documentation
      • Workflow: Calibration -> Scale up coefficients -> Prediction

    Code Structure and Functionality

    Start with a concept map of functions, classes and models with the arrows demonstrating the interactions. Explain why high level functionality is so important when implementing and sharing code (knowledge). Python is good because the community creates small libraries that interact, which is what PyMKS is.

    • concept map
    • high level functionality is important PyMKS provides a high level APIs to access the Materials Knowledge Systems framework and impowers novice users to the frameworks approaches.
    • How did we choose our dependencies? (Sfepy, Sklearn) PyMKS leverages the fantastic scientific computing and data science communities in the Python ecosystem. PyMKS is built on Numpy, Scipy, and Scikit-Learn
    • What have we done about optimization? Some benchmarks.
    • autocorrelate
    • crosscorrelate
    • correlate
    • MKSLocalizationModel
    • MKSHomogenizationModel
    • bases
      • PrimitiveBasis
      • LegendreBasis
      • FourierBasis
      • GeneralizedSphericalHarmonicsBasis
    • User decision tree diagram similar to sklearn mlmap