Blog post - Quality of Metadata and Identifiers
What does OA Switchboard do to make a difference?

29 September 2022 by Yvonne Campfens, Executive Director OA Switchboard

Context

There is widespread appreciation for the importance of metadata and (persistent) identifiers in the research ecosystem, and scholarly publishing in particular (find references here). While it is rewarding to have better metadata, unfortunately the reality is that the quality of (openly) available metadata and identifiers is far from ideal.

It is encouraging to see results of the concerted efforts as part of the OA Switchboard initiative.

Principles

When it comes to ‘improving scholarly publishing metadata’, we’ve learned a thing or two:

Metadata and persistent identifiers (PIDs) should be determined and captured at the source and by the source
Metadata values can change over time, and it is important to capture the event-based and time-specific values of metadata
Organisational stakeholders (research funders, publishers and institutions) should be in control of their own data, and therefore keep their own data secure

These principles are described in more detail below.

How does OA Switchboard contribute to improving the quality of metadata and identifiers?

What does the OA Switchboard not do?

Clean up metadata at the source
Build the (custom) solutions for participants to connect
Take responsibility for what comes out of OA Switchboard (metadata exports)

OA Switchboard is a ‘message hub’ (‘mailman’) only. Our General Terms & Conditions state: “OA Switchboard doesn’t have and doesn’t claim any ownership to or responsibility for Messages, Message Content or Reports.” It is the responsibility of the publishers to ensure accurate and full metadata is entered. OA Switchboard technically checks the ‘messages’ against the schema, but doesn’t validate the metadata itself.

The publisher is to source the data from its systems and is solely responsible for configuring its systems and routers to connect to the OA Switchboard API, if applicable, and for bearing any cost associated with that.

What does the OA Switchboard do?

Within our mission and remit, we’ve learned and done a lot to help improve scholarly metadata over the last years:

Build a community of like minded spirits
We organise and facilitate a community of like-minded spirits to share experiences and best practices and lessons learned. For example, the concept of a custom connector and private datastore that publishers are building and using in their own environment (before connecting to OA Switchboard) shaped here. ‘Custom connectors’ source from multiple systems and data feeds within the publisher’s system landscape and a ‘private datastore’ deals with a-synchronous data and builds article life cycle metadata.

Another crucial and effective component of our community is that it provides an environment where participants can learn, adjust and advance. There are direct lines of communication to share feedback on metadata and a spirit to collaboratively improve quality.

Two-three times per year, we organise webinars to present the OA Switchboard to the wider community and to share and discuss.

(illustration below)
OA Switchboard ‘hub’ itself as 'killer app'
To benefit from using the OA Switchboard message hub, a publisher’s metadata needs to meet a certain level. By participating and connecting to OA Switchboard, there is a motivation and incentive to improve data quality to be able to deliver structured data output (JSON). This drives standardisation and transparency.

(illustration below)
Better management information for publishers
By collecting and processing multiple data feeds that provide the required data fields that go into the OA Switchboard messages, publishers are able collect and structure their own data (across different systems). This provides valuable management information and insight into the status of their metadata quality and potential improvement actions. The structured outputs of OA Switchboard (the ‘messages’ composed and sent), can also be exported by the publishers themselves, and used to further shape pricing models or business and OA strategies.

(illustration below)
Re-use of ‘smart matching’ tool outside OA Switchboard
For application within the OA Switchboard, and as part of the OA Switchboard, we have developed an open source ‘smart matching’ module. This tool helps to go from author affiliation data lacking PID’s to ROR for all authors’ affiliation(s). This tool can also be applied by publishers in their own systems and workflows upstream. This was presented at the ROR Community Meeting (February 2022). Watch the full presentation here. Open source code is here.

(illustration below)
Market research
In Fall 2022 we are conducting a shared research project with Ludo Waltman, Centre for Science and Technology Studies (CWTS), Leiden University and a number of OA Switchboard publishers. The aim is to increase awareness regarding the opportunities to use ROR id’s available at (author and publisher) source to enrich affiliation data in Crossref. In our 27 October webinar, the results will be presented in a broader perspective: “What are the opportunities to use ROR id’s available at source to enrich openly available author affiliation data? What is the status quo?”. Registration is open.

(illustration below)

Next steps

We'll continue to build on the collective awareness that there is a real (business) need to have good metadata and persistent identifiers, and stress the importance of ‘authoritative data from source’. That organisations should make sure they take charge of their own data, but also contribute to enriching openly available metadata.

With practical tools and solutions, we’ll keep on working on incremental improvements and practical solutions.

The content of this blog post was also presented on the ORFG First Community Call: Improve Research Output Tracking - PIDs and Metadata (Tuesday 27 September 2022) and the CHORUS Forum: Improving Scholarly Publishing Metadata (Thursday 29 September 2022).

References:

Brown, Josh, Jones, Phill, Meadows, Alice, & Murphy, Fiona. (2022). The case for investment in a UK persistent identifier strategy: Resilience, insight, and leadership in global research and innovation (Version v2). Zenodo. https://doi.org/10.5281/zenodo.6012367
Brown, Josh, Jones, Phill, Meadows, Alice, & Murphy, Fiona. (2022, September 16). PID-optimised workflows: A vision of a more efficient future. Zenodo. DOI: 10.5281/zenodo.7085489. “...We present four expansive diagrams, each of which is intended to showcase a possible future, in which persistent identifiers (PIDs) are used throughout the research lifecycle to enable automation, efficiency, new discovery tools, and analysis. The use of open PIDs throughout also supports greater transparency and reproducibility in research activities and communications…”
OA Switchboard blog post (2021, December 8). Thinking big: How persistent identifiers (PIDs) can be used to reduce friction in the ongoing transition to open research. https://www.oaswitchboard.org/blog8dec2021 “...Day-to-day, the OA Switchboard is enabling the neutral exchange of OA-related publication level information. In terms of developing and maintaining our central information exchange hub as an operational solution, this is all about shared infrastructure, standardised messaging protocol, and integration with existing systems, powered by Persistent Identifiers (PIDs). The vital contribution that PIDs can make to systemic efficiencies was highlighted in a recent UK Government's policy paper on reducing bureaucratic burdens on research, innovation and higher education. Commissioned by Jisc, the UK PID Consortium published a cost-benefit analysis in June 2021, which identified cost savings associated with rekeying grant, project, and article metadata. Other savings, in the form of automation and aggregation/analysis were reported as likely to be significant. Economic benefits of better data for decision-making by both public and private sector bodies was reported to have even more potential. Other tangible benefits reported included consistency of approach, portability of metadata and workflows, and increased ease of collaboration…”
Brown, Josh, Jones, Phill, Meadows, Alice, Murphy, Fiona, & Clayton, Paul. (2021). Zenodo. https://doi.org/10.5281/zenodo.4772627

Principles in more detail:

Authoritative data from source
Metadata and persistent identifiers (PIDs) should be determined and captured at the source and by the source. The ‘source’ can be the research funder, the researcher/author, the publisher, the institution, etc.

Reasons are that this is where the data originates and the authoritative knowledge about the metadata is, and to avoid inefficiencies and errors being introduced downstream.

Note: Who the authoritative source is can be dependent on the situation, and can be different parties - even at the same ‘event’ (e.g. the researcher at grant submission; the research funder and the researcher at grant award (‘contract’ stage); the research funder to register the grant for a DOI; the corresponding author at manuscript submission; the publisher and the author at ‘Version of Record’ (VoR) publication).
Time-specific metadata values
Metadata values can change over time, and it is important for key organisational stakeholders (research funders, publishers and institutions) to capture the event-based and time-specific values of ‘their’ authoritative metadata at different ‘events’ so that policy and deal compliance can be verified, and information can be gained for the benefit of policy and business model development. Interoperability of systems, open API’s, possibilities to store and export data are critical in this respect, because stakeholders want to be able to use all the information they collect and capture (throughout the whole workflow).

Examples: Corresponding author at submission can be different from corresponding author at publication (VoR); author affiliation at the time of doing the research can be different from their affiliation at submission or publication (VoR).
Keeping your own data secure
Organisational stakeholders (research funders, publishers and institutions) should be in control of their own data, and therefore maintain their own ‘metadata store’. This is for reasons of data privacy and security, and it is especially critical when changing vendor systems. To this end, organisations should have an ‘exit strategy’ in case they’d want to change vendors. They should be able to take their data from the old to new vendor system and the old vendor should contractually be obliged to support that process.

Note: In other industries having the ‘exit scenario’ described in the vendor contract is common practice (or even regulated!).