The Small Blanket
The Small Blanket
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity
of information available about health. Natural Language Processing (NLP) technologies can contribute by
extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such
as clinical notes, patients’ discharge summaries and radiology reports. The extracted information can help
in health-related decision making processes. The Named Entity Recognition (NER) task, which detects
important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction
process yet has received little attention in languages other than English.
In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish
clinical narratives. We explore the use of contextualized word embeddings, which incorporate context
variation into word representations, to enhance named entity recognition in Spanish language clinical text,
particularly of pharmacological substances, compounds, and proteins. Various combinations of word and
sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical
Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-
language medical publications.
Our study shows that our deep-learning-based system with domain-specific contextualized embeddings
coupled with stacking of complementary embeddings yields superior performance over a system with
integrated standard and general-domain word embeddings. With this system, we achieve performance
competitive with the state-of-the-art.
INDEX TERMS Clinical case narratives, Contextualized word embeddings, Deep learning, Language
representations, Named entity recognition, Natural language processing, Spanish language
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
of Neural Network (NN)-based methods, particularly those also experiment with word embedding stacking approaches,
based on Deep Learning, over traditional Machine Learning further improving the results we obtained on the Pharma-
(ML) algorithms. However, beyond the development of new coNER corpus.
NN-based methods, researchers have started to explore the The contributions described in this paper are as follows:
impact of improved strategies for the representation of text (1) we retrieve task-specific corpora for training; (2) we
information provided as input to both NN-based and other construct task-specific contextualized word embeddings from
ML methods. scratch based on Flair and BERT architectures; (3) we
Starting from Bag of Words (BoW) representations, word compare model performances based on constructed word
pre-processing has evolved to include more sophisticated embeddings, explore how these may be combined with other
word representations such as word2vec word embeddings types of embeddings, and compare these with the standard
[6], Glove [7] and FastText [8] embeddings, with the embeddings, producing new baselines; and (4) we conduct
latter two able to capture the subword information from texts. an extensive error analysis checking the source of errors for
Applied in a range of different NLP tasks, methods using different models.
word embeddings have led to significant breakthroughs in The pretrained weights for Flair and BERT models, as
model performance for biomedical NER tasks where limited well as the SciELO corpora used for their training are made
training data is available [9]. publicly available in a Google Drive repository3 .
Further advances to text preprocessing have been proposed
based on language models, that give a word a different A. BIOMEDICAL ENTITY EXTRACTION APPROACHES
embedding vector based on its usage context. The embedding Simple approaches to biomedical NER which sometimes
function is trained either from a language modeling perspec- give surprisingly good results have made use of rules or
tive [10] or based on recovering masked parts of tokens [11]. dictionaries.
The downstream tasks which incorporate these embeddings For example, Eftimov et al. [20] built a set of regular ex-
are considered to be learned in a semi-supervised manner pressions to extract evidence-based dietary recommendations
because they benefit from large amounts of unlabeled data from scientific publications and websites. They first detected
[12], [13]. target mentions in textual data and then extracted them using
Language representation models can be further applied the rule-based technique.
with or without fine-tuning to problems arising in different Various strategies for dictionary lookup have also been
domains1 . The approach of learning on one dataset and shown to be effective [21]. Such approaches leverage
applying the model to another dataset is called Transfer biomedical terminology resources or ontologies, and are par-
Learning. ticularly relevant for biomedical NER where named entities
Among recently introduced contextualized embeddings often correspond to fine-grained domain-specific concepts.
are Semi-supervised Sequence Learning [14], ELMo [10], However, with the development of automatic NLP meth-
ULMFiT [13], the OpenAI transformer [15], the Transformer ods, these methods are rarely applied on their own to solve
[16], BERT [11] and Flair [17]. NER tasks, but rather are used to generate features to feed
In our experiments, we explore the use of both Flair and ML and deep learning (DL) models. For example, in a
BERT contextualized embeddings as they have been shown recent Meddocan challenge on Spanish medical document
to outperform other types of embeddings on a variety of anonymization [22], rule-based techniques were actively uti-
sequence labeling tasks [11], [17]. lized in ML and DL methods to identify patients’ email
In addition to pre-trained domain-specific Spanish addresses, locations, phone numbers, etc. In addition, partic-
FastText embeddings [18], we generate domain-specific ipants in the challenge used domain- and language-specific
Spanish contextualized embeddings by pre-training language gazetteers and Brown clusters derived through unsupervised
representation models using the corpus retrieved from the ML. For example, Perez et al. [23] concluded that Brown
Scientific Electronic Library Online (SciELO) website2 . The clusters and gazetteers played a significant role in ML system
clinical case narrative data from the publications there was performance. Further, Lopez at al. [24] tested both ML and
used to construct the PharmacoNER dataset. To the best of rule-based approaches and concluded that a hybrid of the two
our knowledge, these are the first contextualized embeddings gives the best result.
for Spanish clinical texts made available to a wide audience. Lee et al. [25] solve the problem of biomedical NER in two
The large corpus of more than one billion sentences from steps, firstly by discovering entities’ boundaries using Sup-
SciELO we make available is itself a valuable resource. port Vector Machines (SVM) techniques and then further ap-
This paper extends and deepens a preliminary version of plying an ontology-based hierarchical classification method
our experiments, which are described in [19]. In particular, to classify identified entities. Their system got promising
we add experiments using the Flair framework which outper- results 66.7% F-score on GENIA corpus [26].
form our previous results obtained with the Bert model. We Early work on machine learning-based NER includes such
techniques as reranking relying on kernels [27] as well as
1 https://ai.googleblog.com/2019/07/advancing-semi-supervised-
learning-with.html 3 http://dx.doi.org/10.17632/vf6jmvz83b.3file-75396370-ff40-4a7c-bdfb-
2 https://scielo.org/es d178414bf9b0
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
pure feature processing [28]. Kernel-based methods for entity view, Spanish texts have more subordinate clauses and long
extraction such as SVM utilized in numerous papers [29]– sentences with a high word order flexibility; for instance, the
[31] overall became popular methods for extracting entities subject can be located in any position in a sentence instead of
from texts including biomedical texts [32]. In the latter only before the verb.
paper, the authors examined different kernel functions for the There are a number of peculiarities of clinical texts in
problem of biomedical NER and concluded that tree-based Spanish. Due to translation of English biomedical terms,
kernel is more capable of entity extraction. there are more variants of anglicisms. Some of them are
Current state-of-the-art methods for NER are based on NN freely adapted and others are exact copies of original
architectures, in particular, DL convolutional NNs (CNN) ones, for instance “interleukin” is translated to “interleuk-
and recurrent NNs (RNN). Transfer learning approaches, in ina”/“interleucina”/“interleuquina”. Moreover, Spanish lan-
particular the use of pre-trained contextualized word embed- guage uses accent marks which do not exist in English
dings, have augmented performance of these methods, giving and the preference or not of using these generates lexical
strong results in a number of downstream tasks. variants; for instance, “period” may be transformed into
For example, in the Meddocan shared task the best result “period” or “período”. Adjectives ending in “-al” sometimes
was achieved by a system which utilized pretrained contex- keep their form when translated and sometimes follow
tualized Flair embeddings fed into a simple RNN model. Spanish morphological rules, for example, “viral” may be
However, while dealing with more complex biomedical NER transformed to “viral” or “vírico” and “bacterial” to “bacte-
problems including long, discontinuous, overlapping enti- rial”/“bacteriano”/“bacteriana”/“bacterianos”/“bacterianas”
ties, hybrid approaches show the best results. Li et al. [33] (considering gender and number morphological variants).
integrated KB embeddings in their tree-structured LSTM Greco-latin prefixes show variants like “psi-” (“psi-
framework, achieving approximately 3% gain in F-score. cologo” vs “sicólogo”) or “pseudo-”. The use of hy-
Related to this, contextualized word embeddings together phens between words is more systematic in English
with part-of-speech (PoS) tags were examined for Bulgarian while in Spanish many variants occur. For instance,
NER [34] showing sizeable improvements over the state-of- “beta-carotene” is transformed into “beta-caroteno”/“beta
the-art. In another work, a combination of different types caroteno”/“betacaroteno”/“caroteno beta”. The names of
of contextualized embeddings was explored over English pharmacological substances sometimes remain the same
biomedical literature corpora [35]. The best results were ob- as in English and others are adapted, e.g. “furazosin” is
tained when combining ELMo and Flair word embeddings. adapted to “furazosina”/“furazosín”/“furazosin”. Concerning
Another relevant work includes the extraction of adverse gender (male/female), in some terms there is ambiguity (“la
drug events on 2018 N2C2 shared task corpus [36]. The COVID”/“el COVID”) or both are allowed, for instance, “el
authors experimented with the off-the-shelf Flair NER tiroides” (male) /“la tiroides” (female) for “thyroid” hor-
framework and kernel-based methods and concluded that a mone.
neural Flair-based approach outperforms standard SVM- Clinical notes have many occurrences of abbreviations and
based methods. In the work of Basaldella et al. [37], the usually English abbreviations coexist with Spanish ones. For
authors pretrained ELMo and Flair contextualized word instance, “PSA” corresponds to “prostate-specific antigen”
embeddings on health forums within Reddit and applied and it is preferred to “APE” (“antígeno prostático especí-
them to health social media data for various NER problems. fico”). However, polysemic abbreviations are very common
They concluded that domain-based contextualized word em- in both languages.
beddings heavily influence the performance on downstream From a syntactic point of view, sentences are very simi-
tasks, outperforming embeddings trained both on general- lar in both languages (short sentences or phrases, with use
purpose data or on scientific papers when applied to user- of negation particles and non-standard abbreviations, mis-
generated content. Our experiments are very similar to this spellings, speculation and ungrammatical sentences, among
work. other phenomena).
One can find an extensive overview of recent advances in In summary, there are more lexical variants of medical
NLP field in the work of Minaee et al. [?]. While focusing on terms in Spanish with respect to English due to replication or
document classification, it describes several methods, such partial adaptation of terms. For these reasons, analyzing these
as transformers, which are completely applicable to NER. texts is a more resource consuming task, and normalization
Young et al. [?] cover in detail a key element of the current tools are required.
paper – distributed and contextualized word representation,
among other recent trends introduced in NLP. C. PHARMACONER 2019 SHARED TASK
PharmacoNER is “the first task on chemical and drug men-
B. SPANISH CLINICAL TEXT PROCESSING tion recognition from Spanish medical texts, namely from a
Spanish is an inflectional language with a richer morphology corpus of Spanish clinical case studies” [1]. According to the
compared to the English language; morphemes denote sev- organizers, “the main aim was to promote the development
eral syntactic, semantic and grammatical features of words of named entity recognition tools of practical relevance, that
(such as gender, number, etc). From a syntactic point of is, chemical and drug mentions in non-English content, de-
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
termining the current-state-of-the art, identifying challenges the sense that the contextualized embedding vectors are
and comparing the strategies and results to those published trained without any notion of words but purely treat texts as
for English data”. sequences of characters. This is the main difference between
The challenge consisted of two subtracks – (1) NER offset this type of embeddings and others such as word2vec [45],
and entity classification and (2) Entity indexing. We focus Glove [7], and ELMo [17].
on the NER task. In total, 22 teams participated in the first Flair is trained using an LM objective function aimed
subtrack. Xiong et al. [38] was placed first with an overall at predicting the next character of a sequence, thus keeping
F-score score of 91.05%. They used the multi-lingual large information on the character ordering in a text sequence. By
version of the pre-trained BERT model4 with further fine- learning the character level representations in both directions
tuning to the PharmacoNER NER problem. The key success it was possible to get the context for each character in both
of their implementation of the BERT model in comparison right and left directions. To generate a word embedding from
to other participants’ BERT implementations was that they characters the first and last character states of each word are
incorporated more semantic and syntactic features such as extracted and concatenated.
word shape and PoS tags into their model embedding layer. From the computational and memory point of view, these
Moreover, they applied a Spanish biomedical abbreviation embeddings are more efficient to store and train a model for
detection tool, however they did not detail how the extracted word embeddings. Moreover, they have proven to be more
abbreviations were further used. effective in terms of rare, out-of-vocabulary (OOV) words
The second-best results of Stoeckel et al. [39] were up- and morphologically rich languages [17].
dated after the formal challenge with an F-score of 90.52%. In our experiments, we use the enhanced version of Flair
They used the Flair model and made use of additional embeddings called Pooled Contextualized String Embed-
corpus derived from SciELO, however of a smaller size dings [46]. It is different from the previously developed
than ours. They used this corpus to train word2vec and Flair LM in that it better handles representation for words
FastText word embeddings, and for Flair language in an underspecified context. By dynamically aggregating the
model (LM)-based embeddings they used pre-trained Span- contextualized embedding of each unique word, this infor-
ish general domain word embeddings5 . mation is later used to expand the embedding for the same
Sun et al. [40] achieved the third-best result with an F- word encountered in a poorly, ambiguously specified context.
score of 89.24%. Like Xiong et al. [38], they also used the This situation is often encountered in the Spanish biomedical
pre-trained version of BERT with subsequent fine-tuning but NER tasks, when two words with similar suffixes express
without incorporating any additional features. different types of substances, as for example, creatinina and
Overall, many participants experimented with document hemoglobina where the latter is a protein but the former is
encoding techniques. For example, Rivera Zavala et al. [41] not.
gathered similar size Spanish biomedical corpora to train
their own FastText embeddings. Moreover, they used B. BERT
sense2vec [42] pre-trained embeddings. Both of these Bidirectional Encoder Representations for Transformers
embeddings have proven useful in extracting biomedical (BERT) is the deep learning language representation model
concepts. developed by the Google research team [11]. In contrast to
Later, other research papers appeared addressing NER on ELMo and Flair, it can be used not only for contextualized
the PharmacoNER corpus. Multi-tasked and stacked model word embeddings generation, but also for the downstream
approaches were offered by [43]. Their best multi-tasking tasks itself through a process called fine-tuning.
approach achieved 91.4% F-score. In another paper, a set BERT is trained using the masked word piece representa-
of 104 sophisticated context patterns was constructed [44]. tion and the next sentence objective. Its architecture consists
With this knowledge-based approach, authors achieved an of stacked multi-layered transformers, each having a self-
impressive result of 91% F-score. We do not compare our attention mechanism with multiple attention heads. Introduc-
results with the results of these two papers, as the approach ing self-attention in encoder-decoder architecture of BERT
of [43] required more annotated data, and the approach of allows better capture of long-distance relationships among
[44] required manual rule construction relying on Spanish concepts by avoiding no locality bias.
language syntax. BERT can be further pre-trained for a specific domain or
fine-tuned for a specific task [47]. In particular, fine-tuning
II. METHODS for token level classification tasks is supported by putting a
A. Flair
linear layer, which takes as an input the last hidden state of
the sequence, on top of the BERT model.
Flair embeddings were developed by the Zalando research
group [17]. They are contextualized string embeddings in
C. ADDITIONAL EMBEDDINGS
4 https://storage.googleapis.com/bert_models/ It has been demonstrated that the concatenation of con-
2018_11_03/multilingual_L-12_H-768_A-12.zip textualized embeddings with the standard embeddings usu-
5 http://www.github.com/iamyihwa ally leads to an improvement in results [10], [17]. Follow-
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
TABLE 1. Statistics on PharmacoNER corpus with a hidden state of 256 dimensions, learning rate 0.1, mini-
batch size of 8, and is optimized with Adam. We train for 150
Size (sent) Size (words) Entity types and counts
16,504 396988 Normalizables (4,426),
epochs, and the model that performs best on the validation set
16.5 sent/case 396.2 words/case No_Normalizables (55), provided by the organizers of the competition during training
Proteinas (2,291), Unclear (159) is used to prevent overfitting.
We were unable to conveniently experiment with BERT
embeddings using the Flair framework but preferred the
ing this, for our experiments we used the concatenation Google Cloud TensorFlow TPU set up for both training con-
of Flair embeddings with Spanish general (not domain textualized word embeddings and the downstream task fine-
specific) FastText embeddings [8], domain-specific Span- tuning and predictions as it works much faster8 . However,
ish biomedical FastText embeddings [18], byte-pairwise at the time of writing TPU did not support inference on
encoded embeddings (BPE) [48] and character embeddings downstream tasks, and it was required to switch over to CPU
[49]. The results of models with and without these additional instances for this step.
embeddings are presented. We used a Conditional Random Fields loss [50] as it has
General FastText embeddings for Spanish were trained been shown to increase the accuracy for the NER tasks.
using the full dump of Spanish-language Wikipedia while The training and evaluation batch sizes were set to 32 and
Spanish domain-specific biomedical embeddings utilizing 8, respectively, and the learning rate was set to 5e−5 . The
the architecture of FastText were trained over the Sci- maximum sequence length was set to 160.
ELO6 corpus with 100 million tokens and the health section
Despite the common advice to fine-tune the BERT model
of Wikipedia with 82 million tokens.
for just 3-10 epochs, we fine-tuned it for 30 epochs as we
Character embeddings are generated using a RNN model
noticed it improved the predictions.
and further are concatenated with the other types of word
embeddings in a model.
E. PHARMACONER CORPUS
While the BPE model represents subword embeddings in
275 languages, we used only one language from this model. It The PharmacoNER corpus was used for training and testing
produces relatively light-weight embeddings as they consist our models. It consists of 1000 annotated SPACCC articles
of sub-word tokens of words. This method has been shown derived from open access Spanish medical publications in
to deal well with unknown words and to produce results on a SciELO – an electronic library where complete full-text arti-
par with the standard word embeddings. cles from scientific journals of Latin America, South Africa,
and Spain are systematically collected and stored9 .
D. ENTITY EXTRACTION Table 1 shows summary statistics of the PharmacoNER
In the PharmacoNER task, there are 4 relevant types of entity corpus. Results are scored with the scoring tool distributed by
mentions, although for the official evaluation, only the first 3 the organizers of the challenge. For concepts, true positives
types are used: are strict (the system concept span must match a gold concept
spans begin and end exactly). We report micro-averaged
• Normalizables (Normalizable): mentions of biomedical
results of the lenient evaluation since that was the metric used
concepts which can be normalized to the SNOMED-CT
to score the shared task.
and ChEBI vocabularies;
For training the model, we combined both training and de-
• No_Normalizables (Non-normalizable): biomedical
velopment corpora (yielding 11970 sentences for the merged
concepts which cannot be normalized to the given vo-
corpora) and selected by random shuffling 10% of it for
cabularies;
validation purposes.
• Proteínas (Proteins): mentions of genes and proteins;
• Unclear (Unclear): general substance mentions.
F. LANGUAGE MODEL TRAINING DATASET
The problem of biomedical NER can be framed as a
sequence labeling task where the goal is to extract the cor- We selected a subset of SciELO text based on some heuristics
rect spans of entities. We therefore used a BIO schema. to be in line with the corpus used for training and testing
In this schema, each token in a document is classified as the model. In particular, we chose articles based on the
[B]eginning, [I]nside, or [O]utside of an entity mention. criteria that the specified area of the document is Health
Other than for the BERT experiments, all experiments were Sciences and then selected text in particular sections of the
conducted using the Flair framework7 which is built on top articles. Specifically, text starting with section headings ‘De-
of Theano providing a convenient means of experimenting scripción del caso’, ‘Presentación de caso’, ‘Descripción de
with different combinations of word embeddings. It provides caso clínico’, or ‘Caso clínico’, and ending with the sections
an off-the-shelf neural-based system supporting entity extrac- ‘Bibliografía’ or ‘Referencias’ was selected. In this way we
tion. We train a Long Short Term Memory (LSTM) network retrieved 1, 368, 080 sentences with 86, 851, 275 tokens.
6 SciELO.org 8 https://cloud.google.com/ml-engine/docs/tensorflow/using-tpus
7 https://github.com/zalandoresearch/flair 9 http://www.scielo.org
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
Sun’s BERT Stoeckel’s Flair Xiong’s BERT Flair_Sc_ext2 (ours) BERT_Sc (ours)
Precision Recall F-score Precision Recall F-score Precision Recall F-score Precision Recall F-score Precision Recall F-score
Overall 90.46% 88.06% 89.24% 90.79% 90.30% 90.52% 91.23% 90.88% 91.05% 91.97% 89.74% 90.84% 89.29% 87.83% 88.55%
Normalizables - - - - - - 94.26% 92.91% 93.58% 95.21% 91.88% 93.46 91.48% 91.67% 91.57%
Proteinas - - - - - - 87.87% 89.41% 88.63% 88.56% 88.36% 88.46% 86.74% 84.52% 85.61%
No Normalizables - - - - - - 100.00% 20.00% 33.33% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
FIGURE 2. Distribution of errors Distribution of errors for short (less than 3 terms) and long (with length more or equal 3 terms) entities.
beddings are used. Error analysis For error analysis, we split gold standard
The results of different variations of stacking word embed- entities into 2 groups: short entities with the length less or
dings are shown in Table 3. equal to 2, and long entities with the length greater than or
equal to 3. For the best model Flair_Sc_ext2, the origin
In general, LM-based embeddings lead to better results
and distribution of errors are presented in Fig. 2.
than the standard ones. It can be also seen that the model
enriched with different types of word embeddings gives bet-
It can be seen that the majority of errors are for the short
ter results in terms of precision, recall and F-score. Domain
predicted entities for which there is not even partial overlap
specific word embeddings lead to improvement of results,
with gold standard entities (No intersections false positives
however, they are much smaller in size than general do-
(FP)). Indeed, many biomedical entities are acronyms and
main ones. Augmenting word embeddings with additional
abbreviations which could be easily misclassified based on
subword level embeddings such as FastText, BPE and
casing and length of entities. Interestingly, the second pri-
character embeddings further improves the results.
mary source of errors for short predicted entities is that the
We also experimented with searching concepts in model predicts two entities where the gold standard has a
SNOMED-CT using the Meaning Cloud tool12 , however it single entity (Longer FP). A smaller number of errors are
did not work well, as many concepts for the shared task were related to short gold standard entities which the model fails
annotated based on their synonyms. to detect (false negatives - FN). For long entities, the main
source of error is that predicted entities are shorter than
12 https://www.meaningcloud.com/ required (Shorter FP), contributing nearly 75% of the total
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
Example 2 Correct annotation proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE,
Flair_Sc_ext2 proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, shorter FP
Flair_Sc proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, shorter FP
Standard_Sc proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, longer FP
[4] Kim D, Lee J, So CH, Jeon H, Jeong M, Choi Y, et al. A Neural Named In: Proceedings of the 2nd Clinical Natural Language Processing Work-
Entity Recognition and Multi-Type Normalization Tool for Biomedical shop; 2019. p. 124–133. Available from: http://doi.org/10.5281/zenodo.
Text Mining. IEEE Access. 2019;7:73729–73740. Available from: 2542722.
https://ieeexplore.ieee.org/document/8730332/. [19] Akhtyamova L. Named Entity Recognition in Spanish Biomedical Liter-
[5] Jin Q, Dhingra B, Liu Z, Cohen WW, Lu X. PubMedQA: A Dataset for ature: Short Review and Bert Model. In: 2020 26th Conference of Open
Biomedical Research Question Answering. In: Proceedings of the 2019 Innovations Association (FRUCT). IEEE; 2020. p. 1–7. Available from:
Conference on Empirical Methods in Natural Language Processing and https://ieeexplore.ieee.org/document/9087359/.
the 9th International Joint Conference on Natural Language Processing; [20] Eftimov T, Korouš Seljak B, Korošec P. A rule-based named-entity
2019. p. 2567–2577. Available from: http://arxiv.org/abs/1909.06146. recognition method for knowledge extraction of evidence-based dietary
[6] Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word recommendations. PLoS ONE. 2017;12(6). Available from: https://doi.
Representations in Vector Space. In: arXiv preprint arXiv:1301.3781; org/10.1371/journal.pone.0179488.
2013. Available from: http://arxiv.org/abs/1301.3781. [21] Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB,
[7] Pennington J, Socher R, Manning CD. GloVe: Global Vectors for Word et al. Large-scale biomedical concept recognition: An evaluation of current
Representation. Proceedings of the 2014 conference on empirical methods automatic annotators and their parameters. BMC Bioinformatics. 2014
in natural language processing (EMNLP). 2014;Available from: https:// 2;15(1).
nlp.stanford.edu/pubs/glove.pdf. [22] Marimon M, Gonzalez-Agirre A, Intxaurrondo A, Rodríguez H, Antonio
[8] Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching Word Vectors with Lopez Martin J, Villegas M, et al. Automatic De-Identification of Medical
Subword Information. Transactions of the Association for Computational Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods
Linguistics. 2017 7;5(2307-387X):135–146. Available from: http://arxiv. and Evaluation of Results. In: Proceedings of the Iberian Languages
org/abs/1607.04606. Evaluation Forum (IberLEF 2019); 2019. p. 618–638. Available from:
[9] Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning https://github.com/PlanTL-SANIDAD.
with word embeddings improves biomedical named entity recognition. [23] Perez N, García-Sardiña L, Serras M, Pozo AD. Vicomtech at MED-
Bioinformatics. 2017 7;33(14):i37–i48. Available from: https://academic. DOCAN: Medical Document Anonymization. In: Proceedings of the
oup.com/bioinformatics/article/33/14/i37/3953940. Iberian Languages Evaluation Forum (IberLEF 2019); 2019. p. 698–703.
[10] Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep Available from: https://github.com/PlanTL-SANIDAD/SPACCC.
contextualized word representations. arXiv preprint arXiv:180205365. [24] López P, D MC, Alfonso Ureña-López L, Teresa Mart M. Anonymization
2018 2;Available from: http://arxiv.org/abs/1802.05365. of Clinical Reports in Spanish: a Hybrid Method Based on Machine
[11] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Learning and Rules. In: Proceedings of the Iberian Languages Evaluation
Deep Bidirectional Transformers for Language Understanding. In: arXiv Forum (IberLEF 2019); 2019. p. 688–695. Available from: https://catalog.
preprint arXiv:1810.04805; 2018. Available from: http://arxiv.org/abs/ ldc.upenn.edu/LDC2018T01.
1810.04805.
[25] Lee KJ, Hwang YS, Kim S, Rim HC. Biomedical named entity recognition
[12] Han X, Eisenstein J. Unsupervised Domain Adaptation of Contextualized
using two-phase model based on SVMs. Journal of Biomedical Informat-
Embeddings for Sequence Labeling. In: Proceedings of the 2019 Confer-
ics. 2004 12;37(6):436–447.
ence on Empirical Methods in Natural Language Processing and the 9th
[26] Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus–a semantically
International Joint Conference on Natural Language Processing; 2019. p.
annotated corpus for bio-textmining. Bioinformatics. 2003 7;19(Suppl
4237–4247. Available from: https://github.com/xhan77/.
1):i180–i182. Available from: https://academic.oup.com/bioinformatics/
[13] Howard J, Ruder S. Universal Language Model Fine-tuning for Text
article-lookup/doi/10.1093/bioinformatics/btg1023.
Classification. In: Proceedings of the 56th Annual Meeting of the Asso-
ciation for Computational Linguistics; 2018. p. 328–339. Available from: [27] Nguyen TVT, Moschitti A, Riccardi G. Kernel-based reranking for named-
http://nlp.fast.ai/ulmfit. entity extraction. In: Proceedings of the 23rd International Conference
on Computational Linguistics: Posters. Association for Computational
[14] Dai AM, Le QV. Semi-supervised Sequence Learning. In: Cortes
Linguistics (ACL); 2010. p. 901–909. Available from: https://dl.acm.org/
C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Ad-
citation.cfm?id=1944670.
vances in Neural Information Processing Systems 28. Curran Associates,
Inc.; 2015. p. 3079–3087. Available from: http://papers.nips.cc/paper/ [28] Collins M. Ranking algorithms for named-entity extraction. In: Pro-
5949-semi-supervised-sequence-learning.pdf. ceedings of the 40th Annual Meeting on Association for Computational
[15] Radford A. Improving Language Understanding by Generative Pre- Linguistics. Association for Computational Linguistics (ACL); 2001. p.
Training. In: URL https://s3-us-west-2. amazonaws. com/openai- 489–496.
assets/researchcovers/languageunsupervised/language understanding [29] Björne J, Salakoski T. Generalizing Biomedical Event Extraction. In:
paper. pdf.; 2018. Available from: https://www.semanticscholar.org/ Proceedings of BioNLP Shared Task 2011 Workshop; 2011. p. 183–191.
paper/Improving-Language-Understanding-by-Generative-Radford/ Available from: http://svmlight.joachims.org/svm_.
cd18800a0fe0b668a1cc19f2ec95b5003d0a5035. [30] Takeuchi K, Collier N. Bio-medical entity extraction using support vector
[16] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. machines. Artificial Intelligence in Medicine. 2005;33(2):125–137.
Attention Is All You Need. In Advances in neural information processing [31] Isozaki H, Kazawa H. Efficient support vector classifiers for named entity
systems. 2017 6;p. 5998–6008. Available from: http://arxiv.org/abs/1706. recognition. Association for Computational Linguistics (ACL); 2002. p.
03762. 1–7.
[17] Akbik A, Blythe D, Vollgraf R. Contextual String Embeddings for [32] Patra R, Saha SK. A kernel-based approach for biomedical named entity
Sequence Labeling. In: COLING; 2018. Available from: https://github. recognition. The Scientific World Journal. 2013;2013.
com/zalandoresearch/flair. [33] Li D, Huang L, Ji H, Han J. Biomedical Event Extraction Based on
[18] Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé Knowledge-driven Tree-LSTM. In: Proceedings of NAACL-HLT 2019.
J. Medical Word Embeddings for Spanish: Development and Evaluation. Association for Computational Linguistics; 2019. p. 1421–1430.
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
[34] Simeonova L, Simov K, Osenova P, Nakov P. A Morpho-Syntactically LILIYA AKHTYAMOVA (ORCID: 0000-0003-
Informed LSTM-CRF Model for Named Entity Recognition. In: arXiv 4338-1483) Department of Computing, Techno-
preprint arXiv:1908.10261; 2019. p. preprint. Available from: http:// logical University Dublin, Dublin, Ireland.
github.com/lilia-simeonova/. Liliya Akhtyamova received the M.S. degree
[35] Sharma S, Daniel R. BioFLAIR: Pretrained Pooled Contextual- in Applied Mathematics and Physics from the
ized Embeddings for Biomedical Sequence Labeling Tasks. arXiv Moscow Institute of Physics and Technology
preprint arXiv:190805760. 2019 8;Available from: http://arxiv.org/abs/ (State University), Dolgoprudniy, Moscow obl.,
1908.05760.
Russia, in 2017. She is currently pursuing the
[36] Miller T, Geva A, Dligach D. Extracting Adverse Drug Event Information
Ph.D. degree in Information Technology at the
with Minimal Engineering. In: Proceedings of the 2nd Clinical Natural
Language Processing Workshop; 2019. p. 22–27. Available from: https: Technological University Dublin, Dublin, Ireland.
//www.aclweb.org/anthology/W19-1903. In Spring 2019, she did the traineeship under the Erasmus grant at the
[37] Basaldella M, Collier N. BioReddit: Word Embeddings for User- Carlos III University of Madrid, Madrid, Spain. This paper resulted from
Generated Biomedical NLP. In: Proceedings of the Tenth International the collaboration work performed at that time. Her research interests include
Workshop on Health Text Mining and Information Analysis (LOUHI biomedical text mining, named entity recognition, deep learning.
2019). Association for Computational Linguistics (ACL); 2019. p. 34–38.
[38] Xiong Y, Shen Y, Huang Y, Chen S, Tang B, Wang X, et al. A Deep
Learning-Based System for PharmaCoNER. In: Proceedings of the 5th
Workshop on BioNLP Open Shared Tasks; 2019. p. 33–37. Available
from: https://github.com/PlanTL-SANIDAD/SPACCC_POS-.
[39] Stoeckel M, Hemati W, Mehler A. When Specialization Helps: Using
Pooled Contextualized Embeddings to Detect Chemical and Biomedical
Entities in Spanish. In: Proceedings of the 5th Workshop on BioNLP PALOMA MARTINEZ (ORCID: 0000-0003-
Open Shared Tasks; 2019. p. 11–15. Available from: www.github.com/ 3013-3771) Computer Science and Engineering
zalandoresearch/flair. Department, University Carlos III of Madrid.
[40] Sun C, Yang Z. Transfer Learning in Biomedical Named Entity Recogni- Paloma Martinez received the degree in Com-
tion: An Evaluation of BERT in the PharmaCoNER task. In: Proceedings puter Science and the Ph.D. degree in Com-
of the 5th Workshop on BioNLP Open Shared Tasks; 2019. p. 100–104. puter Science from the Universidad Politécnica
[41] Rivera Zavala RM, Martínez P. Deep neural model with enhanced em- de Madrid (Spain) in 1992 and 1998, respec-
beddings for pharmaceutical and chemical entities recognition in Spanish tively. She is the Head of the Human Language
clinical text. In: Proceedings of the 5th Workshop on BioNLP Open Shared and Accessibility Technologies (HULAT) in the
Tasks; 2019. p. 38–46. Available from: https://ufal.mff. Computer Science and Engineering Department,
[42] Trask A, Michalak P, Liu J. sense2vec - A Fast and Accurate Method University Carlos III of Madrid.
for Word Sense Disambiguation In Neural Word Embeddings. arXiv Her research interests are human language technologies, with the focus
preprint arXiv:151106388. 2015 11;Available from: http://arxiv.org/abs/ on information extraction in the biomedical domain, and web accessibility.
1511.06388.
She is co-author of more than 40 articles in indexed journals and more than
[43] Lange L, Adel H, Strötgen J. Closing the Gap: Joint De-Identification and
a hundred international conference contributions. She has been principal
Concept Extraction in the Clinical Domain. arXiv preprint: 200509397.
2020 5;Available from: https://arxiv.org/abs/2005.09397. investigator and participated in over 40 national and international research
[44] Sánchez León F, Ledesma AG. Annotating and normalizing biomedical projects.
NEs with limited knowledge *. arXiv preprint: 191209152. 2019;Available Currently, she is member of the Spanish Society for Natural Language
from: https://arxiv.org/abs/1912.09152v1. Processing (SEPLN) and member of Dynamization Network for Activities
[45] Mikolov T, Chen K, Corrado G, Dean J. Distributed Representations on Natural Language Processing Technologies. She is a collaborator of the
of Words and Phrases and their Compositionality. Advances in neural Spanish Center of Captioning and Audiodescription (CESyA).
information processing systems. 2013;p. 3111–3119. Available from: +info: http://hulat.inf.uc3m.es/pmf.
https://arxiv.org/pdf/1310.4546.pdf.
[46] Akbik A, Bergmann T, Vollgraf R. Pooled Contextualized Embeddings
for Named Entity Recognition. In: NAACL; 2019. Available from: https:
//github.com/zalandoresearch/flair.
[47] Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a
pre-trained biomedical language representation model for biomedical text
mining. Bioinformatics. 2019;(btz682). Available from: https://github.
com/dmis-lab/biobert. KARIN VERSPOOR (ORCID: 0000-0002-
[48] Heinzerling B, Strube M. BPEmb: Tokenization-free Pre-trained Subword 8661-1544) is a Professor in the School of Com-
Embeddings in 275 Languages. In: Proceedings of the Eleventh Inter- puting and Information Systems at the University
national Conference on Language Resources and Evaluation ({LREC}- of Melbourne.
2018); 2018. p. 18–1473. Available from: https://aclweb.org/anthology/ She received a BA in Computer Science and
papers/L/L18/L18-1473/. Cognitive Sciences from Rice University in 1993,
[49] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural the MSc degree in Cognitive Science and Natural
Architectures for Named Entity Recognition. In: Proceedings of the Language from the University of Edinburgh (UK)
2016 Conference of the North American Chapter of the Association for in 1994, and a PhD in Cognitive Science from the
Computational Linguistics: Human Language Technologies. Stroudsburg, University of Edinburgh (UK) in 1997.
PA, USA: Association for Computational Linguistics; 2016. p. 260–270. After a post-doc at Macquarie University in Sydney, Australia, she spent
Available from: http://aclweb.org/anthology/N16-1030.
5 years in artificial intelligence start-ups, and then held research roles
[50] Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields:
at Los Alamos National Laboratory, the University of Colorado School
Probabilistic Models for Segmenting and Labeling Sequence Data. In: Pro-
ceedings of the Eighteenth International Conference on Machine Learning. of Medicine, National Information Communications Technology Australia
ICML ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; (NICTA), and finally joined the University of Melbourne.
2001. p. 282–289. Available from: http://dl.acm.org/citation.cfm?id= This work was completed while she was a long-term visitor at The
645530.655813. University of Carlos III Madrid (Spain), hosted by Paloma Martinez and
[51] Martínez P, Martínez JL, Segura-Bedmar I, Moreno-Schneider J, Luna A, with the support of the University of Melbourne. Her research focuses on
Revert R. Turning user generated health-related content into actionable biomedical text mining and clinical data analysis.
knowledge through text analytics services. Computers in Industry. 2016
5;78:43–56.
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access
Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.