​The Data Quality Challenge calls on the community to revolutionise publication (meta)data management. By enhancing data quality and completeness research funders, institutions, consortia, and publishers will be empowered to make informed decisions based on accurate and robust information. Read more here.

This initiative is enabled by OA Switchboard and builds upon community-wide lessons learned and best practices and is based on the core values of the OA Switchboard of Trust, Collaboration, and Efficiency.
Your path to continuous improvement starts here.

Accurate and efficient “each stage: first time right”
The Data Quality Challenge invites you to lead the change
Real-world successes in upstream metadata
​​
Complete and correct metadata for author affiliations and research funding are essential. The most effective approach? Fix it upstream.
Integrate metadata creation - collection, quality control and updating - in your systems and workflows before publication, and relay it throughout the editorial, production, and publication processes.
​
Below, we've gathered real-life examples of publishers putting this into practice. Use these best practices as inspiration for strengthening your own approach. ​Get inspired!
EMS Press best practice
Structured from the start: capturing metadata via ‘manuscript extraction’ as early as at submission, and building on globally valid identifiers wherever possible (RoR IDs, DOIs, ORCIDs)
-
EMS Press authors generally submit in TeX format, which enables parsing the source of a submission and extracting all of the information needed.
-
The EMS Press’ publication process builds on globally valid identifiers wherever possible: RoR IDs, DOIs, zbMATH Identifiers, MRIDs and ORCIDs. They continuously develop tooling and incorporate data from other metadata providers like the RoR registry as reliable sources of truth.
-
After peer review and at handover to the typesetter, custom LaTeX stylefiles are provided to semantically annotate each publication. Organization information is annotated with RoR IDs at the source level.
-
Source files are parsed at the time of publication using both open source and custom-built tools, and the extracted information is cross-referenced with their production database to provide feedback to the team responsible for publishing and quality control. The continuously enriched in-house datalake informs and feeds tools and processes for the next submission to publication process.
-
By considering different output formats at the moment of submission, EMS Press can treat JATS, PDF, and other output formats as equal transformations of the source file. Global identifiers keep publications and metadata identifiable in different systems, wherever the data ends up being used.


eLife best practice
Capturing affiliations at submission with ‘author select’, ensuring that ROR IDs are introduced early and verified before publication, coupled with a quality assurance process during proofing​​
-
In the submission system eJP, the author is presented with a ‘Search for Organizations’ widget to add an affiliation. Subject to their country and auto-completion, the affiliation and corresponding ROR ID are added.
-
The ROR IDs are exported in the package that is handed over to their vendor Kriyadocs.
-
During proofing, a quality assurance process which involves automated XML validation, querying ROR’s public API, and manual checks from production vendors and in-house eLife staff, ensures that missing ROR IDs are introduced and content and metadata are both complete and correct.
-
The ROR IDs are included in the JATS XML for the published articles which are deposited to Crossref.
American Society for Microbiology best practice
Combining AI-powered submission tools with editorial oversight via expert manual checks​​
-
Authors submit manuscripts via ChronosHub and receive AI-powered assistance with metadata entry and PID matching.
-
The Ringgold IDs get exported as person identity metadata in the MECA XML-compliant package that is passed to Kriyadocs at acceptance.
-
The accepted manuscript is handed over to vendor Kriyadocs, and they extract the author affiliations from the manuscripts. These data are reconciled against and combined with the submission metadata.
-
Newly captured institutions are validated automatically against the Ringgold database, using both the main institution name and location (city and country). If there is a match, the Ringgold ID is captured.
-
If there is no match, a query is inserted for copyeditors to analyze the affiliation. These staff members will try to resolve name mismatches and spelling anomalies to resolve more Ringgold IDs for author affiliations.
-
Ringgold IDs are captured in the published JATS XML and deposited to Crossref.
-
Ringgold IDs are transformed into ROR IDs for certain downstream deliverables.


Rockefeller University Press best practice
Maintaining ROR IDs across the full publishing workflow, from ‘author select’ at submission through metadata deposits upon publication​​
-
In eJP, the author is presented with a ‘Search for Organizations’ widget to add an affiliation. Subject to their country and auto-completion, the affiliation and corresponding ROR ID are added.
-
These ROR IDs persist throughout their workflow. Staff ensure that authors have applied RORs to the “manuscript affiliations”, those checked for Read & Publish deal eligibility, and all RORs are included in the metadata exported to their vendor, TNQ, who merges the data into the JATS XML for the article.
-
Upon publication, Silverchair includes the RORs in downstream deposits to Crossref and others.
American Chemical Society best practice
​​Multi-method PID matching with near-complete coverage​​
-
ACS applies a suite of methods at submission to link a Ringgold ID to an author affiliation, including:
-
Extraction from the submitted manuscript through the ACS Publishing Center powered by ChronosHub.
-
A submitting author is presented with a pick-list of suggestions based on the extraction data and/or their profile.
-
-
Applying real-time proprietary algorithms to a submitting author typing free text affiliation(s).
-
This gets ACS to 96–97% of affiliations with a Ringgold ID.
-
The Ringgold IDs get stored in the in-house production system and data lake, and quality assurance processes are in place to ensure consistency between content and underlying metadata throughout the workflow. The last 3% of Ringgold coverage in author affiliations is achieved by involving humans in combination with tools.
-
The Ringgold IDs are included in the JATS XML and the metadata file for the published articles which are deposited to Crossref.


Pensoft best practice
AI-assisted extraction with human review and in-house metadata validation
-
A proprietary tool extracts the metadata from the submitted manuscript (assisted by AI) and authors check/edit this information.
-
Throughout peer review and production, in-house staff ensure consistency between information in the manuscript/article and the structured metadata.
-
Included in the Jats XML for authors: ORCID, credit roles, (ROR IDs for affiliations coming soon).
-
Included in the JATS XML for funders: ROR and DOI.
Beilstein-Institut best practice
Post-acceptance metadata QA through automation and expert review ​​
-
Automated extraction of author affiliation information from manuscript after acceptance*).
-
Human involvement from in-house staff to search and confirm the correct affiliation – ROR ID combination.
-
Quality assurance for consistency between content and structured metadata is an integral part of the editorial, production and publication workflow and system.
-
Beilstein-Institut includes ROR IDs in their JATS XML, as well as ORCID IDs, CRediT roles, and funder DOIs, and after an article is published, the ROR IDs are included in Crossref deposits.
*) Beilstein-Institut has a proprietary submission, peer review, production, and publication system.

The Royal Society best practice
Embedding metadata in OA payment and agreement workflows ​​
The Royal Society leverages CCC RightsLink to manage their OA program, including agreements. In doing so:
-
The accepted manuscript metadata and author affiliation IDs are passed via API from the editorial system into RightsLink.
-
RightsLink leverages Ringgold IDs in the metadata, automatically checking the author's Ringgold affiliation alongside other deal eligibility criteria to assign manuscripts to the appropriate deal, apply discounts, waivers, and fulfill institutions’ reporting needs. Other organizational IDs are also supported.
-
RightsLink stores the precise identifiers that drove the match to an agreement, and those IDs are shared with publishers and their institutional customers in detailed agreement reporting.
-
This affiliation metadata can be automatically passed to other 3rd party systems via API/custom connectors Here is a sample payload:


Our Data Quality Fall 2024 Campaign has officially begun!
For all stakeholders in the research ecosystem, poor metadata poses significant risks and knock-on effects to research integrity, discoverability, and operational excellence.
Good quality metadata empowers stakeholders, and based on our core values of Trust, Collaboration, and Efficiency we launched the Data Quality Challenge in 2023, to call on the community to revolutionise publication (meta)data management.
​
Read more in our September 2024 blog post.
​​The focus of our current campaign is specifically on publishers
For publishers, high quality metadata are the building blocks to any commercial arrangement and business modelling. The consequences of not addressing metadata issues are significant: alongside the well documented impact on discovery, usage, and impact, publishing values will be undermined, reputations damaged, and business opportunities jeopardized. In summary the risks of poor metadata practices are (in alphabetical order):
​
-
Harder contract negotiations and customer dissatisfaction
-
Loss of control in the open data ecosystem
-
Misattribution of scholarly work
-
Non-compliance with industry standards
-
Operational burden and higher cost
-
Reduced discoverability, usage, impact and submissions
Go here for more insights.

Now the good news…
​
We have talked about the risks, but let's summarise the benefits for publishers of investing in improving metadata quality:
-
Business Insights: Enhanced management information and better opportunities for portfolio management
-
Customer Insights: Greater insights into your customers and authors
-
Stakeholder and Customer Support: Improved services for research funders, institutions, and consortia
-
System Integration: Full integration into the scholarly communications ecosystem, resulting in greater potential impact
-
Research Integrity: Support for upstream editorial and research integrity objectives and thus better support for your authors
-
Discoverability: Enhanced visibility to improve readership and citation
​​
Our Data Quality Fall 2024 Campaign has officially begun!
Join us and commit to improving metadata quality for all.​
Data Quality Challenge information below created: October 2023
​
The Data Quality Challenge calls on the community to revolutionise publication (meta)data management by standardising information provided by publishers and allowing stakeholders to consolidate and process data efficiently.
By enhancing data quality and completeness research funders, institutions, consortia, and publishers will be empowered to make informed decisions based on accurate and robust information.
This initiative is enabled by OA Switchboard and builds upon community-wide lessons learned and best practices and is based on the core values of the OA Switchboard of Trust, Collaboration, and Efficiency.
Your path to continuous improvement starts here.
Aims
-
To work collaboratively across the community to improve publication (meta)data quality and completeness.
-
To standardise the provision of data across publishers to ease consolidation and processing.
-
To empower the community to make evidence-based decisions.
-
To increase efficiencies across the community.
-
To make this a continuous improvement process to meet emerging standards as the industry moves forward.

The overall goals of the Data Quality Challenge are to:
Standardise & Aggregate:
Ensure the publisher community is providing data in the same format, to make collection and self-storage by research funders and institutions more efficient and trustworthy.


Import & Analyse:
Allow the community to create meaningful analyses and their own reporting out of the data provided, and also enable them to import this data into their own systems to have correct and complete records.
Enhance:
Enable the community to enhance and develop the data provided by publishers to support further internal workflows, decision-making and reporting.

For Publishers: Lead the way by committing to continuous
data improvements
-
Demonstrate support for this collaborative, cross industry initiative and create positive brand associations
-
Invest in your future by committing to continuous data improvements
-
Increase efficiencies – get it right first-time round, cheaper and more efficient than fixing it later
-
Improve internal management information
-
Develop more effective integration with the complex OA ecosystem.
-
Improve customer relations by supporting their data needs

For Research Funders and Institutions:
Join the collaborative effort and achieve change
-
Be part of the solution to improve data quality and completeness
-
Be part of the future for collaborative, cross industry progress
-
Support the call to action by communicating to publishers the problems experienced by inaccurate and non-standardised data
-
Promote OA Switchboard to publishers and tell them about the Data Quality Challenge
-
Encourage your publishers to collaborate by explaining how improved data also benefits themselves
