pharmas and academia join forces
to make data FAIR
FDA/NCTR-MAQC 2022 Conference, 26-27 Sept, 2022
Slides: https://www.slideshare.net/SusannaSansone
Professor of Data Readiness
Associate Director, Oxford e-Research Centre
Interoperability Platform
ExCo
elixir-europe.org
Founding
Academic Editor
nature.com/sdata
datareadiness.eng.ox.uk
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690
Twitter: @SusannaASansone
FAIR
Cookbook
https://faircookbook.elixir-europe.org
A resource open to all!
The FAIR Principles
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
doi.org/10.2777/986252
www.gov.uk/government/publications/open-
research-data-task-force-final-report
www.fair-access.net.au
doi.org/10.1787/25186167 ark:/48223/pf0000374837
FAIR has aligned the broad community
around common guidelines
doi.org/10.7486/DRI.tq582c863
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
FAIR as driven of the digital transformation
in pharmas
● To improve biopharma R&D productivity
● To enables powerful new AI analytics to
access data for ML and prediction
● Requirements
o financial, technical, training
● Challenges
o change the culture, show business
value, achieve the ‘FAIR enough’ on
an enterprise scale
The FAIR Principles:
a continuum of features, attributes, behaviours
The FAIR Cookbook:
motivations and ambitions
beyond the hype
Large body of generic FAIR
guidance
Motivations
Non-specific guidance for
the life sciences
Ambitions
Target specific situations to deliver a guide with
applied examples
Join academia and industry forces to make the
case for FAIR data management
Build capacity for high quality data
management in the private and public sectors
Lack of practical examples
of ‘how-to’ with different
data types and scenarios
Overview
Validation and
sustainability
Content
Contributors’
perspectives
Outline
Overview
What it is?
A collection of recipes that cover the
operation steps of FAIR data management
Who is it for?
Data Managers,
Data Stewards,
Data Curators
Software
Developers,
Terminology
Managers
Policymakers,
Funders,
Trainers
Researchers,
Data Scientists,
Principal
Investigators
Who is it for?
Data Managers,
Data Stewards,
Data Curators
Software
Developers,
Terminology
Managers
• A venue to document and share existing and new approaches or
services to support FAIRification
• A way to promote a participatory culture that enables sharing of
expertise by getting exposure and credit
Policymakers,
Funders,
Trainers
• Practical examples to
recommend in policies
• To use in educational
material to incentivize and
guide FAIR in practice.
• Introductory material
• Hands-on, technical
step-by-step examples
Researchers,
Data Scientists,
Principal
Investigators
Who developed it?
Almost 100 life sciences professionals, researchers and data managers
FARIplus
partners
Industry
+
Academia
ELIXIR
Nodes
represented
Current operations and Editorial Board
Content prioritisation
Identification of topics
Review of drafts
Call for contributions
Monthly book-dash
events
Pre-defined focus areas
Breakout on topics
Housekeeping
Technical platform
Website Martin Cook
Dominique Batista
Office of Data
Science Strategy
● Over 70 recipes released and more
content available
● Covering technical processes with
FAIRification examples in the life
sciences:
○ Omics
○ pre-clinical
○ clinical areas
But not limited to it!
Coverage and learning objectives
Learn how to improve the FAIRness with exemplar datasets
Understand the levels and indicators of FAIRness
Discover open source technologies, tools and services
Find out the required skills
Acknowledge the challenges
Anatomy of a recipe
components
Ingredients
An idea of tools/skills needed
Step by step process
Guidelines, process, description
Practical
elements, code
snippets
#Python3
#zooma-annotator-script.py
file
def
get_annotations(propertyType
, propertyValues, filters = ""): "
Examples
Conclusions
What should I read next?
Anatomy of a recipe
components
Links complementary resources
Current links with and references to:
Tagging recipes with
‘Dataset Maturity Indicators’
Maturity level and indicators
new feature!
https://fairplus.github.io/Data-Maturity
Provide insights into FAIR Maturity reached by
applying a specific recipe to improve a
dataset
Credit and citability
because all contributions matter!
CreDiT
attribution ontology
w3id.org/faircookbook/FCB006
The Cookbook platform
open source community practices
Built using jupyter-book, following the practice used by the
Alan Turing Institute’ the Turing Way book, an open source
community-driven guide to reproducible, ethical, inclusive
and collaborative data science
Technology stack:
● github for version control and hosting
● Markdown the write-up
● HackMD markdown editor, integrated with github
● jupyter notebook for executable code
● binder for the web execution of jupyter notebook
distributed with a recipe
● mermaid javascript library for flowcharts, Gantt charts
and pie charts
https://the-turing-way.netlify.app/welcome
The FAIRness of the Cookbook
Accessibility
HTTPS protocol
Interoperability
JSON-LD markup
Cross-links to objects in other
registries
From the ELIXIR ecosystem and beyond!
CreDiT attribution ontology
Reusability
CC BY 4.0 license for all
content
Findability
Sitemap.xml, JSON-LD
Markup with Schema.org,
Bioschemas
w3id.org unique persistent
identifiers for each recipe
ORCID for authors
Content
Over 70 recipes and growing
Content type overview
Recipes focused on
each technical aspects,
and applicable to any
data type
Recipes specific to a
topic or data type
Define what your needs are
Goal: improving visibility of content, e.g.:
Goal: semantic integration of datasets from multiple sources, e.g.:
Goal: security compliance and with regulators, e.g.:
https://w3id.org/faircookbook/FCB010
https://w3id.org/faircookbook/FCB007
https://w3id.org/faircookbook/FCB006
https://w3id.org/faircookbook/FCB020 https://w3id.org/faircookbook/FCB004
https://w3id.org/faircookbook/FCB014 https://w3id.org/faircookbook/FCB035
Content type overview
http://w3id.org/faircookbook/FCB044
http://w3id.org/faircookbook/FCB042
Content type overview
https://w3id.org/faircookbook/FCB050
https://w3id.org/faircookbook/FCB049
Contributors’ perspective
What has motivated so many people
to contribute?
● To stay engaged in a growing community and updated with the latest
development
● To proof their FAIR competence (as individual and as an organisation)
● To expand their network of potential collaborators, clients and users
● To address common challenges and find common solutions at pre-
competitive level
Inclusion criteria
Proof of competence, e.g.:
● Involved in data centric projects as data managers
● Authored publication, datasets, reports, technical documentation...
● (For industry) provided service as data curators, data vendors...
Become part of a
community of FAIR experts!
1Identify a chapter and a topic
Findability Accessibility Interoperability Reusability
Infrastructure Applied examples Assessment
2 Choose a way of contributing and see our guidelines
Google Docs
HackMD
Git
Markdown cheat sheet
Get recipe template
Tips and tricks
Submit an
outline
3
You can
discuss it
with the
Editorial
Board
What to contribute?
• Contribute test datasets
• Contribute examples
• Contribute code
Found a coverage gap?
Want to share your expertise ?
Topics of interest:
• ETL to RDF solutions
• KG generation from unstructured data
• GDPR compliant document
• Input validation & RDF shape constraint
validation
• Specific FAIRification process
What a good recipe should
and should not be
Should Should Not
Specific •Should target a specific task/action
towards the improvement of FAIR
indicators. e.g. (mapping data to a
certain ontology, converting files from
one format to another, anonymising a
dataset for better reusability)
Too broad •Should not be a repeat of full user
manual
Complete •Should be an end-to-end recipe that
user can follow and finish a task
Too high level •Should not be a features list of a tool
•Should not be an advertisement
FAIR
•The tools used should be open, or, if
proprietary, a “free” or “community”
version should be available (maybe
with limited functions/support) and be
FAIR.
Incomplete •Should not be just a teaser that only
shows a few steps at the beginning
Closed •User can only test it after purchasing a
paid software
Should Should Not
Specific •Should target a specific task/action
towards the improvement of FAIR
indicators. e.g. (mapping data to a
certain ontology, converting files from
one format to another, anonymising a
dataset for better reusability)
Too broad •Should not be a repeat of full user
manual
Complete •Should be a end-to-end recipe that
user can follow and finish a task
Too high level •Should not be a features list of a tool
•Should not be an advertisement
FAIR
•The tools used can be proprietary, but
a “free” or “community” version should
be available (maybe with limited
functions just to finish the task, or
without support) and be FAIR.
Incomplete •Should not be just a teaser that only
shows a few steps at the beginning
Closed •User can only test it after purchasing a
paid software
What a good recipe should
and should not be
Validation and sustainability
● As an educational material on FAIR in a training context (as part of the
FAIRplus Fellowship Programme):
o showed the validity of the recipes’ content towards the intended
(learning) objectives
o confirmed that some recipes require a greater amount of technical
background knowledge, and a steeper learning curve
Utility and value based on three uses:
what we have learned
● As a practical guidance to improve day-to-day tasks for FAIRer data
● As contributor towards changing the culture in research data
management (behind the pharmas’ firewall)
o outcomes were expressed in terms of satisfaction of the value of the
recipes, against specific tasks, or challenges addressed
o they reported a positive contribution towards their discussion on return
on investment to operationalize FAIR
Utility and value based on three uses:
what we have learned
Sustainability: working on four fronts
Content Infrastructure Embedding Endorsements
39
Sustainability: working on four fronts
Content Infrastructure Embedding Endorsements
40
Cultivating the
collective knowledge
Light weight
Thanks to
Editorial Board
Section Editors
FAIRplus partners
All bookdashes’ participants
All authors
fairplus-cookbook@elixir-europe.org
faircookbook.elixir-europe.org
fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking
receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the
views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the
information contained herein.

FAIR Cookbook

  • 1.
    pharmas and academiajoin forces to make data FAIR FDA/NCTR-MAQC 2022 Conference, 26-27 Sept, 2022 Slides: https://www.slideshare.net/SusannaSansone Professor of Data Readiness Associate Director, Oxford e-Research Centre Interoperability Platform ExCo elixir-europe.org Founding Academic Editor nature.com/sdata datareadiness.eng.ox.uk Susanna-Assunta Sansone ORCiD: 0000-0001-5306-5690 Twitter: @SusannaASansone
  • 2.
  • 3.
    The FAIR Principles Globallyunique and persistent identifiers Community defined descriptive metadata Community defined terminologies Detailed provenance Terms of access Terms of use
  • 4.
    doi.org/10.2777/986252 www.gov.uk/government/publications/open- research-data-task-force-final-report www.fair-access.net.au doi.org/10.1787/25186167 ark:/48223/pf0000374837 FAIR hasaligned the broad community around common guidelines doi.org/10.7486/DRI.tq582c863 https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
  • 5.
    FAIR as drivenof the digital transformation in pharmas ● To improve biopharma R&D productivity ● To enables powerful new AI analytics to access data for ML and prediction ● Requirements o financial, technical, training ● Challenges o change the culture, show business value, achieve the ‘FAIR enough’ on an enterprise scale
  • 6.
    The FAIR Principles: acontinuum of features, attributes, behaviours
  • 7.
    The FAIR Cookbook: motivationsand ambitions beyond the hype Large body of generic FAIR guidance Motivations Non-specific guidance for the life sciences Ambitions Target specific situations to deliver a guide with applied examples Join academia and industry forces to make the case for FAIR data management Build capacity for high quality data management in the private and public sectors Lack of practical examples of ‘how-to’ with different data types and scenarios
  • 8.
  • 9.
  • 10.
    What it is? Acollection of recipes that cover the operation steps of FAIR data management
  • 11.
    Who is itfor? Data Managers, Data Stewards, Data Curators Software Developers, Terminology Managers Policymakers, Funders, Trainers Researchers, Data Scientists, Principal Investigators
  • 12.
    Who is itfor? Data Managers, Data Stewards, Data Curators Software Developers, Terminology Managers • A venue to document and share existing and new approaches or services to support FAIRification • A way to promote a participatory culture that enables sharing of expertise by getting exposure and credit Policymakers, Funders, Trainers • Practical examples to recommend in policies • To use in educational material to incentivize and guide FAIR in practice. • Introductory material • Hands-on, technical step-by-step examples Researchers, Data Scientists, Principal Investigators
  • 13.
    Who developed it? Almost100 life sciences professionals, researchers and data managers FARIplus partners Industry + Academia ELIXIR Nodes represented
  • 14.
    Current operations andEditorial Board Content prioritisation Identification of topics Review of drafts Call for contributions Monthly book-dash events Pre-defined focus areas Breakout on topics Housekeeping Technical platform Website Martin Cook Dominique Batista Office of Data Science Strategy
  • 15.
    ● Over 70recipes released and more content available ● Covering technical processes with FAIRification examples in the life sciences: ○ Omics ○ pre-clinical ○ clinical areas But not limited to it! Coverage and learning objectives Learn how to improve the FAIRness with exemplar datasets Understand the levels and indicators of FAIRness Discover open source technologies, tools and services Find out the required skills Acknowledge the challenges
  • 16.
    Anatomy of arecipe components Ingredients An idea of tools/skills needed Step by step process Guidelines, process, description Practical elements, code snippets #Python3 #zooma-annotator-script.py file def get_annotations(propertyType , propertyValues, filters = ""): " Examples Conclusions What should I read next?
  • 17.
    Anatomy of arecipe components
  • 18.
    Links complementary resources Currentlinks with and references to:
  • 19.
    Tagging recipes with ‘DatasetMaturity Indicators’ Maturity level and indicators new feature! https://fairplus.github.io/Data-Maturity Provide insights into FAIR Maturity reached by applying a specific recipe to improve a dataset
  • 20.
    Credit and citability becauseall contributions matter! CreDiT attribution ontology w3id.org/faircookbook/FCB006
  • 21.
    The Cookbook platform opensource community practices Built using jupyter-book, following the practice used by the Alan Turing Institute’ the Turing Way book, an open source community-driven guide to reproducible, ethical, inclusive and collaborative data science Technology stack: ● github for version control and hosting ● Markdown the write-up ● HackMD markdown editor, integrated with github ● jupyter notebook for executable code ● binder for the web execution of jupyter notebook distributed with a recipe ● mermaid javascript library for flowcharts, Gantt charts and pie charts https://the-turing-way.netlify.app/welcome
  • 22.
    The FAIRness ofthe Cookbook Accessibility HTTPS protocol Interoperability JSON-LD markup Cross-links to objects in other registries From the ELIXIR ecosystem and beyond! CreDiT attribution ontology Reusability CC BY 4.0 license for all content Findability Sitemap.xml, JSON-LD Markup with Schema.org, Bioschemas w3id.org unique persistent identifiers for each recipe ORCID for authors
  • 23.
  • 24.
    Over 70 recipesand growing
  • 25.
    Content type overview Recipesfocused on each technical aspects, and applicable to any data type Recipes specific to a topic or data type
  • 26.
    Define what yourneeds are Goal: improving visibility of content, e.g.: Goal: semantic integration of datasets from multiple sources, e.g.: Goal: security compliance and with regulators, e.g.: https://w3id.org/faircookbook/FCB010 https://w3id.org/faircookbook/FCB007 https://w3id.org/faircookbook/FCB006 https://w3id.org/faircookbook/FCB020 https://w3id.org/faircookbook/FCB004 https://w3id.org/faircookbook/FCB014 https://w3id.org/faircookbook/FCB035
  • 27.
  • 28.
  • 29.
  • 30.
    What has motivatedso many people to contribute? ● To stay engaged in a growing community and updated with the latest development ● To proof their FAIR competence (as individual and as an organisation) ● To expand their network of potential collaborators, clients and users ● To address common challenges and find common solutions at pre- competitive level
  • 31.
    Inclusion criteria Proof ofcompetence, e.g.: ● Involved in data centric projects as data managers ● Authored publication, datasets, reports, technical documentation... ● (For industry) provided service as data curators, data vendors...
  • 32.
    Become part ofa community of FAIR experts! 1Identify a chapter and a topic Findability Accessibility Interoperability Reusability Infrastructure Applied examples Assessment 2 Choose a way of contributing and see our guidelines Google Docs HackMD Git Markdown cheat sheet Get recipe template Tips and tricks Submit an outline 3 You can discuss it with the Editorial Board
  • 33.
    What to contribute? •Contribute test datasets • Contribute examples • Contribute code Found a coverage gap? Want to share your expertise ? Topics of interest: • ETL to RDF solutions • KG generation from unstructured data • GDPR compliant document • Input validation & RDF shape constraint validation • Specific FAIRification process
  • 34.
    What a goodrecipe should and should not be Should Should Not Specific •Should target a specific task/action towards the improvement of FAIR indicators. e.g. (mapping data to a certain ontology, converting files from one format to another, anonymising a dataset for better reusability) Too broad •Should not be a repeat of full user manual Complete •Should be an end-to-end recipe that user can follow and finish a task Too high level •Should not be a features list of a tool •Should not be an advertisement FAIR •The tools used should be open, or, if proprietary, a “free” or “community” version should be available (maybe with limited functions/support) and be FAIR. Incomplete •Should not be just a teaser that only shows a few steps at the beginning Closed •User can only test it after purchasing a paid software
  • 35.
    Should Should Not Specific•Should target a specific task/action towards the improvement of FAIR indicators. e.g. (mapping data to a certain ontology, converting files from one format to another, anonymising a dataset for better reusability) Too broad •Should not be a repeat of full user manual Complete •Should be a end-to-end recipe that user can follow and finish a task Too high level •Should not be a features list of a tool •Should not be an advertisement FAIR •The tools used can be proprietary, but a “free” or “community” version should be available (maybe with limited functions just to finish the task, or without support) and be FAIR. Incomplete •Should not be just a teaser that only shows a few steps at the beginning Closed •User can only test it after purchasing a paid software What a good recipe should and should not be
  • 36.
  • 37.
    ● As aneducational material on FAIR in a training context (as part of the FAIRplus Fellowship Programme): o showed the validity of the recipes’ content towards the intended (learning) objectives o confirmed that some recipes require a greater amount of technical background knowledge, and a steeper learning curve Utility and value based on three uses: what we have learned
  • 38.
    ● As apractical guidance to improve day-to-day tasks for FAIRer data ● As contributor towards changing the culture in research data management (behind the pharmas’ firewall) o outcomes were expressed in terms of satisfaction of the value of the recipes, against specific tasks, or challenges addressed o they reported a positive contribution towards their discussion on return on investment to operationalize FAIR Utility and value based on three uses: what we have learned
  • 39.
    Sustainability: working onfour fronts Content Infrastructure Embedding Endorsements 39
  • 40.
    Sustainability: working onfour fronts Content Infrastructure Embedding Endorsements 40 Cultivating the collective knowledge Light weight
  • 41.
    Thanks to Editorial Board SectionEditors FAIRplus partners All bookdashes’ participants All authors fairplus-cookbook@elixir-europe.org faircookbook.elixir-europe.org fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the information contained herein.