- All of the formats above describe data at rest/transit (serialized) and not the description of the data in the various programs (dictionaries in Python, maps or ptrees in C++, etc). It is important to separate these ideas as JSON isn't a formal part of any language beyond JS and the ideas should describe when this data is in use as well.
- The positives and negatives of the serialization formats were not discussed, why was JSON chosen as the format of choice? For example, Arrays are common in quantum chemistry, but are known to be slow, lossy, and larger through JSON compared to binary formats.
The introductory paragraph for "Handling Data and Metadata" introduces the need for formats that can be "used from a number of programming languages ranging from compiled languages such as C, C++, and Fortran on supercomputers/high performance computing resources to perform quantum chemistry calculations through to interpreted languages such as Python for data analysis and JavaScript/TypeScript in web frontends or C/C++ in desktop applications." The second paragraph introduces several standards considered, and mentioned "Large data is preferably stored in binary formats, which is where HDF5 was seen as one strong contender and more recently MessagePack has gained traction due to its JSON-like structure and wide language support thanks to its simple binary specification." We have added paragraph 4 to that section to more fully explain our reasoning, and future paths forward.
General comments: