Menu Close

Metadata section of the white paper

This topic contains 1 reply, has 2 voices, and was last updated by  Martin Thomas 3 weeks, 1 day ago.

  • Author
    Posts
  • #8385

    Vasily
    Participant

    First of all, for the advantage of others, let me share the link to the white paper in question: https://emmc.info/emmc-csa-white-paper-for-standards-of-modelling-software-development/ I hope this is the right link, but please tell otherwise if this is not the most actual version of the white paper.

    I have looked into the “Metadata” section more than in other sections, and here are my comments.

    A) Metadata is most useful when it is conciously designed and produced with certain purpose (or range of purposes) in mind. There is a statement about the purpose in the “Metadata” section: “In particular, it is a prerequisite of making software citable and the created results reproducible.” However, the suggested metadata elements do not seem to fully support this reasonable puprpose. In particular, a clear identity is currently required only for the user who runs the calculation, but not for the software, for the hardware platform, for the input data or for the output data, or for the calulation as a distinctive whole act of combining all the other elements. All the entities involved in the calculation, as well as the calculation itself, should be ideally assigned with persistent identifiers (PIDs): then we may hope indeed that the calculation becomes citeable and reasonably reproducible. The good news is that many of the metadata elements of the EMMC interst may well already have PIDs assigned to them (or established practices of assigning such PIDs), e.g. data or software may have DataCite DOIs, a user may have ORCID or ISNI or other researcher’s identifier assigned. A stable URL/URI can be a good proxy for a persistent identifier if there is no “true PID” like DOI in place, e.g. Creative Commons licenses can be reasonably identified by their URLs – but some kind of persistent identification (that is universal beyond a particular software platform) is something that is worth having for all metadata elements.

    B) Calculations may have their own lifecycle, with differenet levels of maturity for each stage. Initially, a user may just run calculations for test purposes, or otherwise play with the software platform. Then some calculations may be run for collecting statistics. Then certain “exemplar” calculations may be chosen for sharing them with a wider research community, e.g. referring to them from a journal article. When we speak of calculation metadata, we should clearly realize which stage of the calculation lifecycle we are going to apply this metadata to. Or we may have metadata designed so that it is principally applicable to the entire calculation lifecycle, but certain stages may require mandatory data elements that can be optional for other stages. As an example, the “exemplar” calculation to be shared is reasonably expected to have a persistent identifier  to make it truly citeable, but test calculations or “unineteresting” calculations with no intention of sharing them may not need a PID assigned. Therefore, metadata elements may have modality of being mandatory or optional – whith specific recommendations for different stages of the calculation lifecycle.

    C) The current list of proposed metadata elements does not contain a notion of a “model” – which should be ideally separated from the “software” as the very same software, e.g. a highly configurable simulation platform, may be able to implement very different models that the user has in mind. Describing models with metadata is a challenging subject, especially if we want a universal metadata for the models that are separated from the platform-specific software features, so it is important to not get daunted with this challenge. A simple approach that would allow to capture some aspects of the models used could be having in metadata two sections for keywords of different kind: one section for “subject keywords” (that describe the simulation subject or purpose), and another section for “configuration keywords” (that describe the configuration of software for a particular simulation, e.g. as key/value pairs).

    D) It is all right to design a specific metadata, yet it is a good practice to connect its elements to the elements in existing metadata models, to terms in established vocabularies, or to PIDs (see comment A). This is the way to reuse someone else’s effort rather than spend your own, or contribute your own suggestions in cases when the existing metadata, vocabulries or PIDs require extensions or amendments to better serve your specific case. Being a part of FREYA project https://www.project-freya.eu/en , I am happy to discuss challenges and oppotrunities for using all kinds of PIDs for identifying the elements of the proposed EMMC metadata. Being a part of the Semantic Assets for Materials Science Task Group within the RDA Vocabulary Services Interest Group, I am happy to try to connect the EMMC metadata design effort to the works of that Task Group.

    Normal
    0

    false
    false
    false

    EN-GB
    X-NONE
    X-NONE

    /* Style Definitions */
    table.MsoNormalTable
    {mso-style-name:”Table Normal”;
    mso-tstyle-rowband-size:0;
    mso-tstyle-colband-size:0;
    mso-style-noshow:yes;
    mso-style-priority:99;
    mso-style-parent:””;
    mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
    mso-para-margin:0cm;
    mso-para-margin-bottom:.0001pt;
    mso-pagination:widow-orphan;
    font-size:10.0pt;
    font-family:”Calibri”,”sans-serif”;}

    Normal
    0

    false
    false
    false

    EN-GB
    X-NONE
    X-NONE

    /* Style Definitions */
    table.MsoNormalTable
    {mso-style-name:”Table Normal”;
    mso-tstyle-rowband-size:0;
    mso-tstyle-colband-size:0;
    mso-style-noshow:yes;
    mso-style-priority:99;
    mso-style-parent:””;
    mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
    mso-para-margin:0cm;
    mso-para-margin-bottom:.0001pt;
    mso-pagination:widow-orphan;
    font-size:10.0pt;
    font-family:”Calibri”,”sans-serif”;}

  • #9035

    Martin Thomas
    Participant

    For making software citable, very reasonable proposals have been developed by the Citation File Format initiative, see also:

    https://citation-file-format.github.io/

    This contains a well suitable catalogue of metadata for software, treated separately from any included models.

You must be logged in to reply to this topic.