Intro to codemetar, Codemeta creator for R packages

Carl Boettiger

2022-09-02

codemetar: generate codemeta metadata for R packages

Why codemetar? The ‘Codemeta’ Project defines a ‘JSON-LD’ format for describing software metadata, as detailed at https://codemeta.github.io. This package provides utilities to generate, parse, and modify codemeta.jsonld files automatically for R packages, as well as tools and examples for working with codemeta json-ld more generally.

It has three main goals:

Why create a codemeta.json for your package?

Why bother creating a codemeta.json for your package? R packages encode lots of metadata in the DESCRIPTION file, README, and other places, telling users and developers about the package purpose, authors, license, dependencies, and other information that facilitates discovery, adoption, and credit for your software. Unfortunately, because each software language records this metadata in a different format, that information is hard for search engines, software repositories, and other developers to find and integrate.

By generating a codemeta.json file, you turn your metadata into a format that can easily crosswalk between metadata in many other software languages. CodeMeta is built on schema.org a simple structured data format developed by major search engines like Google and Bing to improve discoverability in search. CodeMeta is also understood by significant software archiving efforts such as Software Heritage Project, which seeks to permanently archive all open source software.

For more general information about the CodeMeta Project for defining software metadata, see https://codemeta.github.io. In particular, new users might want to start with the User Guide, while those looking to learn more about JSON-LD and consuming existing codemeta files should see the Developer Guide.

Installation and usage requirements

You can install the latest version from CRAN using:

install.packages("codemetar")

You can also install the development version of codemetar from GitHub with:

# install.packages("devtools")
devtools::install_github("ropensci/codemetar")

For optimal results you need a good internet connection.

The package queries

If your machine is offline, a more minimal codemeta.json will be created. If your internet connection is poor or there are firewalls, the codemeta creation might indefinitely hang.

Create a codemeta.json in one function call

codemetar can take the path to the source package root to glean as much information as possible.

codemetar::write_codemeta()
library("magrittr")
"../../codemeta.json" %>%
  details::details(summary = "codemetar's codemeta.json",
                   lang = "json")

By default most often from within your package folder you’ll simply run codemetar::write_codemeta().

Keep codemeta.json up-to-date

How to keep codemeta.json up-to-date? In particular, how to keep it up to date with DESCRIPTION? codemetar itself no longer supports automatic sync, but there are quite a few methods available out there. Choose one that fits well into your workflow!

script = readLines(system.file("templates", "description-codemetajson-pre-commit.sh", package = "codemetar"))
usethis::use_git_hook("pre-commit",
                     script = script)

Alternatively, you can have GitHub actions route run codemetar on each commit. If you do this you should try to remember to run git pull before making any new changes on your local project. However, if you forgot to pull and already committed new changes, fret not, you can use (git pull --rebase) to rewind you local changes on top of the current upstream HEAD.

click here to see the workflow

on:
  push:
    branches: master
    paths:
      - DESCRIPTION
      - .github/workflows/main.yml

name: Render codemeta
jobs:
  render:
    name: Render codemeta
    runs-on: macOS-latest
    if: "!contains(github.event.head_commit.message, 'cm-skip')"
    steps:
      - uses: actions/checkout@v1
      - uses: r-lib/actions/setup-r@v1
      - name: Install codemetar
        run: Rscript -e 'install.packages("codemetar")'
      - name: Render codemeta
        run: Rscript -e 'codemetar::write_codemeta()'
      - name: Commit results
        run: |
          git commit codemeta.json -m 'Re-build codemeta.json' || echo "No changes to commit"
          git push https://${{github.actor}}:${{secrets.GITHUB_TOKEN}}@github.com/${{github.repository}}.git HEAD:${{ github.ref }} || echo "No changes to commit"


A brief intro to common terms we’ll use:

How to improve your package’s codemeta.json?

The best way to ensure codemeta.json is as complete as possible is to set metadata in all the usual places, and then if needed add more metadata.

To ensure you have metadata in the usual places, you can run codemetar::give_opinions().

Usual terms in DESCRIPTION

  • Fill BugReports and URL.

  • Using the Authors@R notation allows a much richer specification of author roles, correct parsing of given vs family names, and email addresses.

In the current implementation, developers may specify an ORCID url for an author in the optional comment field of Authors@R, e.g.

Authors@R: c(person(given = "Carl",
             family = "Boettiger",
             role = c("aut", "cre", "cph"),
             email = "cboettig@gmail.com",
             comment = c(ORCID = "0000-0002-1642-628X")))

which will allow codemetar to associate an identifier with the person. This is clearly something of a hack since R’s person object lacks an explicit notion of id, and may be frowned upon.

Usual terms in the README

In the README, you can use badges for continuous integration, repo development status (repostatus.org or lifecycle.org), provider (e.g. for CRAN).

GitHub repo topics

If your package source is hosted on GitHub and there’s a way for codemetar to determine that (URL in DESCRIPTION, or git remote URL) codemetar will use GitHub repo topics as keywords in codemeta.json. If you also set keywords in DESCRIPTION (see next section), codemetar will merge the two lists.

Set even more terms via DESCRIPTION

In general, setting metadata via the places stated earlier is the best solution because that metadata is used by other tools (e.g. the URLs in DESCRIPTION can help the package users, not only codemetar).

The DESCRIPTION file is the natural place to specify any metadata for an R package. The codemetar package can detect certain additional terms in the CodeMeta context. Almost any additional codemeta field can be added to and read from the DESCRIPTION into a codemeta.json file (see codemetar:::additional_codemeta_terms for a list).

CRAN requires that you prefix any additional such terms to indicate the use of schema.org explicitly, e.g. keywords would be specified in a DESCRIPTION file as:

X-schema.org-keywords: metadata, codemeta, ropensci, citation, credit, linked-data

Where applicable, these will override values otherwise guessed from the source repository. Use comma-separated lists to separate multiple values to a property, e.g. keywords.

See the DESCRIPTION file of the codemetar package for an example.

Set the branch that codemetar references

There are a number of places that codemetar will reference a github branch if your code is hosted on github (e.g. for release notes, readme, etc.). By default, codemetar will use the name “master” but you can change that to whatever your default branch is by setting the option “codemeta_branch” (e.g. options(codemeta_branch = "main") before calling write_codemeta() to use the branch named “main” as the default branch).

Going further

Check out all the codemetar vignettes for tutorials on other cool stuff you can do with codemeta and json-ld.

A new feature is the creation of a minimal schemaorg.json for insertion on your website’s webpage for Search Engine Optimization, when the write_minimeta argument of write_codemeta() is TRUE.

You could e.g. use the code below in a chunk in README.Rmd with results="asis".

glue::glue('<script type="application/ld+json">
      {glue::glue_collapse(readLines("schemaorg.json"), sep = "\n")}
    </script>')

Refer to Google documentation for more guidance.