gcbr-match

This repository contains the code used to identify mentions of Global Core Biodata Resource (GCBR) names and their corresponding accession numbers within a corpus of scientific texts. This code was developed by the SIBiLS team, at Swiss Institute of Bioinformatics, as part of a study to compare the amount GCBR information in main manuscripts versus supplementary data files.

Repository Contents

src/: Contains the main Python scripts or other code files used for the matching process.
- collection_sample.json: a sample collection of texts. List of dictionaries with the document identifier ("_id") and text ("text")
- GCBRs.csv: a CSV file (delimiter \t) with GCBRs names, regular expression for matching names, and a list of regular expressions for matching accession numbers, taken from identifiers.org (range from 0 to n).
- match.py: script for identifying GCBR names and accession numbers in a collection.
README.md: This file, providing an overview of the repository.
LICENSE: MIT License, Apache 2.0.

Usage

Names and accession numbers matching:
```
python src/gcbr_match/match.py
```

Notes and Limitations

The primary focus of the study is not to exhaustively identify every mention of GCBRs, but rather to compare the relative amount of information within supplementary data files against within manuscripts. For this purpose, we rely on simple methodological approach, prioritizing high precision (minimizing false positives) over maximal recall (capturing all possible instances).

Contributing

If you would like to contribute to this project, please feel free to:

Submit bug reports or feature requests as issues.
Fork the repository and submit pull requests with your changes.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/gcbr_match		src/gcbr_match
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gcbr-match

Repository Contents

Usage

Notes and Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

sibils/GCBR_match

Folders and files

Latest commit

History

Repository files navigation

gcbr-match

Repository Contents

Usage

Notes and Limitations

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages