0% found this document useful (0 votes)

69 views11 pages

The Small Blanket

good paper

Uploaded by

xkjon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views11 pages

The Small Blanket

good paper

Uploaded by

xkjon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Testing Contextualized Word

Embeddings to Improve NER in Spanish
Clinical Case Narratives
LILIYA AKHTYAMOVA1 , PALOMA MARTÍNEZ2 , KARIN VERSPOOR3 , AND JOHN CARDIFF1
1
Technological University Dublin, Dublin, Ireland (e-mail: john.cardiff@TUDublin.ie)
2
Carlos III University of Madrid, Madrid, Spain (e-mail: pmf@inf.uc3m.es)
3
The University of Melbourne, Melbourne, Australia (e-mail: karin.verspoor@unimelb.edu.au)
Corresponding author: Liliya Akhtyamova (e-mail: akhtyamova@phystech.edu).
Part of this work was supported by the Research Program of the Ministry of Economy and Competitiveness - Government of Spain,
(DeepEMR project TIN2017-87548-C2-1-R). KV’s contributions were supported by the University of Melbourne, through a Study Leave
grant. LA’s contributions were supported by the Technological University Dublin as part of a traineeship in UC3M Spain, through an
Erasmus+ grant.

ABSTRACT In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity
of information available about health. Natural Language Processing (NLP) technologies can contribute by
extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such
as clinical notes, patients’ discharge summaries and radiology reports. The extracted information can help
in health-related decision making processes. The Named Entity Recognition (NER) task, which detects
important concepts in texts (e.g., diseases, symptoms, drugs, etc.), is crucial in the information extraction
process yet has received little attention in languages other than English.
In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish
clinical narratives. We explore the use of contextualized word embeddings, which incorporate context
variation into word representations, to enhance named entity recognition in Spanish language clinical text,
particularly of pharmacological substances, compounds, and proteins. Various combinations of word and
sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical
Case Corpus (SPACCC). This data set consists of clinical case sections extracted from open access Spanish-
language medical publications.
Our study shows that our deep-learning-based system with domain-specific contextualized embeddings
coupled with stacking of complementary embeddings yields superior performance over a system with
integrated standard and general-domain word embeddings. With this system, we achieve performance
competitive with the state-of-the-art.

INDEX TERMS Clinical case narratives, Contextualized word embeddings, Deep learning, Language
representations, Named entity recognition, Natural language processing, Spanish language

I. BACKGROUND the gap in named entity recognition (NER) of biomedical

concepts in a corpus of Spanish clinical case narratives. The
URRENTLY, most research in Natural Language Pro-
C cessing (NLP) is focused on English language texts,
while text written in different languages is often left un-
corpus includes annotations of clinical terminology, chemical
and protein entities.
explored; this has been particularly true in the domain of Extraction of biomedical entities from these narratives is
biomedicine. Given the amount of data produced every year relevant to a number of NLP tasks such as adverse drug and
by biomedical experts, doctors, and patients in non-English drug-drug interaction extraction [2], [3], biomedical concept
speaking countries, this represents a significant missed op- normalization, knowledge base population [4], and question
portunity. answering [5].
The PharmacoNER 2019 challenge [1] aimed to close Recent developments in NLP have shown the advantage

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3018688, IEEE Access

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

of Neural Network (NN)-based methods, particularly those also experiment with word embedding stacking approaches,
based on Deep Learning, over traditional Machine Learning further improving the results we obtained on the Pharma-
(ML) algorithms. However, beyond the development of new coNER corpus.
NN-based methods, researchers have started to explore the The contributions described in this paper are as follows:
impact of improved strategies for the representation of text (1) we retrieve task-specific corpora for training; (2) we
information provided as input to both NN-based and other construct task-specific contextualized word embeddings from
ML methods. scratch based on Flair and BERT architectures; (3) we
Starting from Bag of Words (BoW) representations, word compare model performances based on constructed word
pre-processing has evolved to include more sophisticated embeddings, explore how these may be combined with other
word representations such as word2vec word embeddings types of embeddings, and compare these with the standard
[6], Glove [7] and FastText [8] embeddings, with the embeddings, producing new baselines; and (4) we conduct
latter two able to capture the subword information from texts. an extensive error analysis checking the source of errors for
Applied in a range of different NLP tasks, methods using different models.
word embeddings have led to significant breakthroughs in The pretrained weights for Flair and BERT models, as
model performance for biomedical NER tasks where limited well as the SciELO corpora used for their training are made
training data is available [9]. publicly available in a Google Drive repository3 .
Further advances to text preprocessing have been proposed
based on language models, that give a word a different A. BIOMEDICAL ENTITY EXTRACTION APPROACHES
embedding vector based on its usage context. The embedding Simple approaches to biomedical NER which sometimes
function is trained either from a language modeling perspec- give surprisingly good results have made use of rules or
tive [10] or based on recovering masked parts of tokens [11]. dictionaries.
The downstream tasks which incorporate these embeddings For example, Eftimov et al. [20] built a set of regular ex-
are considered to be learned in a semi-supervised manner pressions to extract evidence-based dietary recommendations
because they benefit from large amounts of unlabeled data from scientific publications and websites. They first detected
[12], [13]. target mentions in textual data and then extracted them using
Language representation models can be further applied the rule-based technique.
with or without fine-tuning to problems arising in different Various strategies for dictionary lookup have also been
domains1 . The approach of learning on one dataset and shown to be effective [21]. Such approaches leverage
applying the model to another dataset is called Transfer biomedical terminology resources or ontologies, and are par-
Learning. ticularly relevant for biomedical NER where named entities
Among recently introduced contextualized embeddings often correspond to fine-grained domain-specific concepts.
are Semi-supervised Sequence Learning [14], ELMo [10], However, with the development of automatic NLP meth-
ULMFiT [13], the OpenAI transformer [15], the Transformer ods, these methods are rarely applied on their own to solve
[16], BERT [11] and Flair [17]. NER tasks, but rather are used to generate features to feed
In our experiments, we explore the use of both Flair and ML and deep learning (DL) models. For example, in a
BERT contextualized embeddings as they have been shown recent Meddocan challenge on Spanish medical document
to outperform other types of embeddings on a variety of anonymization [22], rule-based techniques were actively uti-
sequence labeling tasks [11], [17]. lized in ML and DL methods to identify patients’ email
In addition to pre-trained domain-specific Spanish addresses, locations, phone numbers, etc. In addition, partic-
FastText embeddings [18], we generate domain-specific ipants in the challenge used domain- and language-specific
Spanish contextualized embeddings by pre-training language gazetteers and Brown clusters derived through unsupervised
representation models using the corpus retrieved from the ML. For example, Perez et al. [23] concluded that Brown
Scientific Electronic Library Online (SciELO) website2 . The clusters and gazetteers played a significant role in ML system
clinical case narrative data from the publications there was performance. Further, Lopez at al. [24] tested both ML and
used to construct the PharmacoNER dataset. To the best of rule-based approaches and concluded that a hybrid of the two
our knowledge, these are the first contextualized embeddings gives the best result.
for Spanish clinical texts made available to a wide audience. Lee et al. [25] solve the problem of biomedical NER in two
The large corpus of more than one billion sentences from steps, firstly by discovering entities’ boundaries using Sup-
SciELO we make available is itself a valuable resource. port Vector Machines (SVM) techniques and then further ap-
This paper extends and deepens a preliminary version of plying an ontology-based hierarchical classification method
our experiments, which are described in [19]. In particular, to classify identified entities. Their system got promising
we add experiments using the Flair framework which outper- results 66.7% F-score on GENIA corpus [26].
form our previous results obtained with the Bert model. We Early work on machine learning-based NER includes such
techniques as reranking relying on kernels [27] as well as
1 https://ai.googleblog.com/2019/07/advancing-semi-supervised-
learning-with.html 3 http://dx.doi.org/10.17632/vf6jmvz83b.3file-75396370-ff40-4a7c-bdfb-
2 https://scielo.org/es d178414bf9b0

2 VOLUME 4, 2016

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

pure feature processing [28]. Kernel-based methods for entity view, Spanish texts have more subordinate clauses and long
extraction such as SVM utilized in numerous papers [29]– sentences with a high word order flexibility; for instance, the
[31] overall became popular methods for extracting entities subject can be located in any position in a sentence instead of
from texts including biomedical texts [32]. In the latter only before the verb.
paper, the authors examined different kernel functions for the There are a number of peculiarities of clinical texts in
problem of biomedical NER and concluded that tree-based Spanish. Due to translation of English biomedical terms,
kernel is more capable of entity extraction. there are more variants of anglicisms. Some of them are
Current state-of-the-art methods for NER are based on NN freely adapted and others are exact copies of original
architectures, in particular, DL convolutional NNs (CNN) ones, for instance “interleukin” is translated to “interleuk-
and recurrent NNs (RNN). Transfer learning approaches, in ina”/“interleucina”/“interleuquina”. Moreover, Spanish lan-
particular the use of pre-trained contextualized word embed- guage uses accent marks which do not exist in English
dings, have augmented performance of these methods, giving and the preference or not of using these generates lexical
strong results in a number of downstream tasks. variants; for instance, “period” may be transformed into
For example, in the Meddocan shared task the best result “period” or “período”. Adjectives ending in “-al” sometimes
was achieved by a system which utilized pretrained contex- keep their form when translated and sometimes follow
tualized Flair embeddings fed into a simple RNN model. Spanish morphological rules, for example, “viral” may be
However, while dealing with more complex biomedical NER transformed to “viral” or “vírico” and “bacterial” to “bacte-
problems including long, discontinuous, overlapping enti- rial”/“bacteriano”/“bacteriana”/“bacterianos”/“bacterianas”
ties, hybrid approaches show the best results. Li et al. [33] (considering gender and number morphological variants).
integrated KB embeddings in their tree-structured LSTM Greco-latin prefixes show variants like “psi-” (“psi-
framework, achieving approximately 3% gain in F-score. cologo” vs “sicólogo”) or “pseudo-”. The use of hy-
Related to this, contextualized word embeddings together phens between words is more systematic in English
with part-of-speech (PoS) tags were examined for Bulgarian while in Spanish many variants occur. For instance,
NER [34] showing sizeable improvements over the state-of- “beta-carotene” is transformed into “beta-caroteno”/“beta
the-art. In another work, a combination of different types caroteno”/“betacaroteno”/“caroteno beta”. The names of
of contextualized embeddings was explored over English pharmacological substances sometimes remain the same
biomedical literature corpora [35]. The best results were ob- as in English and others are adapted, e.g. “furazosin” is
tained when combining ELMo and Flair word embeddings. adapted to “furazosina”/“furazosín”/“furazosin”. Concerning
Another relevant work includes the extraction of adverse gender (male/female), in some terms there is ambiguity (“la
drug events on 2018 N2C2 shared task corpus [36]. The COVID”/“el COVID”) or both are allowed, for instance, “el
authors experimented with the off-the-shelf Flair NER tiroides” (male) /“la tiroides” (female) for “thyroid” hor-
framework and kernel-based methods and concluded that a mone.
neural Flair-based approach outperforms standard SVM- Clinical notes have many occurrences of abbreviations and
based methods. In the work of Basaldella et al. [37], the usually English abbreviations coexist with Spanish ones. For
authors pretrained ELMo and Flair contextualized word instance, “PSA” corresponds to “prostate-specific antigen”
embeddings on health forums within Reddit and applied and it is preferred to “APE” (“antígeno prostático especí-
them to health social media data for various NER problems. fico”). However, polysemic abbreviations are very common
They concluded that domain-based contextualized word em- in both languages.
beddings heavily influence the performance on downstream From a syntactic point of view, sentences are very simi-
tasks, outperforming embeddings trained both on general- lar in both languages (short sentences or phrases, with use
purpose data or on scientific papers when applied to user- of negation particles and non-standard abbreviations, mis-
generated content. Our experiments are very similar to this spellings, speculation and ungrammatical sentences, among
work. other phenomena).
One can find an extensive overview of recent advances in In summary, there are more lexical variants of medical
NLP field in the work of Minaee et al. [?]. While focusing on terms in Spanish with respect to English due to replication or
document classification, it describes several methods, such partial adaptation of terms. For these reasons, analyzing these
as transformers, which are completely applicable to NER. texts is a more resource consuming task, and normalization
Young et al. [?] cover in detail a key element of the current tools are required.
paper – distributed and contextualized word representation,
among other recent trends introduced in NLP. C. PHARMACONER 2019 SHARED TASK
PharmacoNER is “the first task on chemical and drug men-
B. SPANISH CLINICAL TEXT PROCESSING tion recognition from Spanish medical texts, namely from a
Spanish is an inflectional language with a richer morphology corpus of Spanish clinical case studies” [1]. According to the
compared to the English language; morphemes denote sev- organizers, “the main aim was to promote the development
eral syntactic, semantic and grammatical features of words of named entity recognition tools of practical relevance, that
(such as gender, number, etc). From a syntactic point of is, chemical and drug mentions in non-English content, de-
VOLUME 4, 2016 3

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

termining the current-state-of-the art, identifying challenges the sense that the contextualized embedding vectors are
and comparing the strategies and results to those published trained without any notion of words but purely treat texts as
for English data”. sequences of characters. This is the main difference between
The challenge consisted of two subtracks – (1) NER offset this type of embeddings and others such as word2vec [45],
and entity classification and (2) Entity indexing. We focus Glove [7], and ELMo [17].
on the NER task. In total, 22 teams participated in the first Flair is trained using an LM objective function aimed
subtrack. Xiong et al. [38] was placed first with an overall at predicting the next character of a sequence, thus keeping
F-score score of 91.05%. They used the multi-lingual large information on the character ordering in a text sequence. By
version of the pre-trained BERT model4 with further fine- learning the character level representations in both directions
tuning to the PharmacoNER NER problem. The key success it was possible to get the context for each character in both
of their implementation of the BERT model in comparison right and left directions. To generate a word embedding from
to other participants’ BERT implementations was that they characters the first and last character states of each word are
incorporated more semantic and syntactic features such as extracted and concatenated.
word shape and PoS tags into their model embedding layer. From the computational and memory point of view, these
Moreover, they applied a Spanish biomedical abbreviation embeddings are more efficient to store and train a model for
detection tool, however they did not detail how the extracted word embeddings. Moreover, they have proven to be more
abbreviations were further used. effective in terms of rare, out-of-vocabulary (OOV) words
The second-best results of Stoeckel et al. [39] were up- and morphologically rich languages [17].
dated after the formal challenge with an F-score of 90.52%. In our experiments, we use the enhanced version of Flair
They used the Flair model and made use of additional embeddings called Pooled Contextualized String Embed-
corpus derived from SciELO, however of a smaller size dings [46]. It is different from the previously developed
than ours. They used this corpus to train word2vec and Flair LM in that it better handles representation for words
FastText word embeddings, and for Flair language in an underspecified context. By dynamically aggregating the
model (LM)-based embeddings they used pre-trained Span- contextualized embedding of each unique word, this infor-
ish general domain word embeddings5 . mation is later used to expand the embedding for the same
Sun et al. [40] achieved the third-best result with an F- word encountered in a poorly, ambiguously specified context.
score of 89.24%. Like Xiong et al. [38], they also used the This situation is often encountered in the Spanish biomedical
pre-trained version of BERT with subsequent fine-tuning but NER tasks, when two words with similar suffixes express
without incorporating any additional features. different types of substances, as for example, creatinina and
Overall, many participants experimented with document hemoglobina where the latter is a protein but the former is
encoding techniques. For example, Rivera Zavala et al. [41] not.
gathered similar size Spanish biomedical corpora to train
their own FastText embeddings. Moreover, they used B. BERT
sense2vec [42] pre-trained embeddings. Both of these Bidirectional Encoder Representations for Transformers
embeddings have proven useful in extracting biomedical (BERT) is the deep learning language representation model
concepts. developed by the Google research team [11]. In contrast to
Later, other research papers appeared addressing NER on ELMo and Flair, it can be used not only for contextualized
the PharmacoNER corpus. Multi-tasked and stacked model word embeddings generation, but also for the downstream
approaches were offered by [43]. Their best multi-tasking tasks itself through a process called fine-tuning.
approach achieved 91.4% F-score. In another paper, a set BERT is trained using the masked word piece representa-
of 104 sophisticated context patterns was constructed [44]. tion and the next sentence objective. Its architecture consists
With this knowledge-based approach, authors achieved an of stacked multi-layered transformers, each having a self-
impressive result of 91% F-score. We do not compare our attention mechanism with multiple attention heads. Introduc-
results with the results of these two papers, as the approach ing self-attention in encoder-decoder architecture of BERT
of [43] required more annotated data, and the approach of allows better capture of long-distance relationships among
[44] required manual rule construction relying on Spanish concepts by avoiding no locality bias.
language syntax. BERT can be further pre-trained for a specific domain or
fine-tuned for a specific task [47]. In particular, fine-tuning
II. METHODS for token level classification tasks is supported by putting a
A. Flair
linear layer, which takes as an input the last hidden state of
the sequence, on top of the BERT model.
Flair embeddings were developed by the Zalando research
group [17]. They are contextualized string embeddings in
C. ADDITIONAL EMBEDDINGS
4 https://storage.googleapis.com/bert_models/ It has been demonstrated that the concatenation of con-
2018_11_03/multilingual_L-12_H-768_A-12.zip textualized embeddings with the standard embeddings usu-
5 http://www.github.com/iamyihwa ally leads to an improvement in results [10], [17]. Follow-
4 VOLUME 4, 2016

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

TABLE 1. Statistics on PharmacoNER corpus with a hidden state of 256 dimensions, learning rate 0.1, mini-
batch size of 8, and is optimized with Adam. We train for 150
Size (sent) Size (words) Entity types and counts
16,504 396988 Normalizables (4,426),
epochs, and the model that performs best on the validation set
16.5 sent/case 396.2 words/case No_Normalizables (55), provided by the organizers of the competition during training
Proteinas (2,291), Unclear (159) is used to prevent overfitting.
We were unable to conveniently experiment with BERT
embeddings using the Flair framework but preferred the
ing this, for our experiments we used the concatenation Google Cloud TensorFlow TPU set up for both training con-
of Flair embeddings with Spanish general (not domain textualized word embeddings and the downstream task fine-
specific) FastText embeddings [8], domain-specific Span- tuning and predictions as it works much faster8 . However,
ish biomedical FastText embeddings [18], byte-pairwise at the time of writing TPU did not support inference on
encoded embeddings (BPE) [48] and character embeddings downstream tasks, and it was required to switch over to CPU
[49]. The results of models with and without these additional instances for this step.
embeddings are presented. We used a Conditional Random Fields loss [50] as it has
General FastText embeddings for Spanish were trained been shown to increase the accuracy for the NER tasks.
using the full dump of Spanish-language Wikipedia while The training and evaluation batch sizes were set to 32 and
Spanish domain-specific biomedical embeddings utilizing 8, respectively, and the learning rate was set to 5e−5 . The
the architecture of FastText were trained over the Sci- maximum sequence length was set to 160.
ELO6 corpus with 100 million tokens and the health section
Despite the common advice to fine-tune the BERT model
of Wikipedia with 82 million tokens.
for just 3-10 epochs, we fine-tuned it for 30 epochs as we
Character embeddings are generated using a RNN model
noticed it improved the predictions.
and further are concatenated with the other types of word
embeddings in a model.
E. PHARMACONER CORPUS
While the BPE model represents subword embeddings in
275 languages, we used only one language from this model. It The PharmacoNER corpus was used for training and testing
produces relatively light-weight embeddings as they consist our models. It consists of 1000 annotated SPACCC articles
of sub-word tokens of words. This method has been shown derived from open access Spanish medical publications in
to deal well with unknown words and to produce results on a SciELO – an electronic library where complete full-text arti-
par with the standard word embeddings. cles from scientific journals of Latin America, South Africa,
and Spain are systematically collected and stored9 .
D. ENTITY EXTRACTION Table 1 shows summary statistics of the PharmacoNER
In the PharmacoNER task, there are 4 relevant types of entity corpus. Results are scored with the scoring tool distributed by
mentions, although for the official evaluation, only the first 3 the organizers of the challenge. For concepts, true positives
types are used: are strict (the system concept span must match a gold concept
spans begin and end exactly). We report micro-averaged
• Normalizables (Normalizable): mentions of biomedical
results of the lenient evaluation since that was the metric used
concepts which can be normalized to the SNOMED-CT
to score the shared task.
and ChEBI vocabularies;
For training the model, we combined both training and de-
• No_Normalizables (Non-normalizable): biomedical
velopment corpora (yielding 11970 sentences for the merged
concepts which cannot be normalized to the given vo-
corpora) and selected by random shuffling 10% of it for
cabularies;
validation purposes.
• Proteínas (Proteins): mentions of genes and proteins;
• Unclear (Unclear): general substance mentions.
F. LANGUAGE MODEL TRAINING DATASET
The problem of biomedical NER can be framed as a
sequence labeling task where the goal is to extract the cor- We selected a subset of SciELO text based on some heuristics
rect spans of entities. We therefore used a BIO schema. to be in line with the corpus used for training and testing
In this schema, each token in a document is classified as the model. In particular, we chose articles based on the
[B]eginning, [I]nside, or [O]utside of an entity mention. criteria that the specified area of the document is Health
Other than for the BERT experiments, all experiments were Sciences and then selected text in particular sections of the
conducted using the Flair framework7 which is built on top articles. Specifically, text starting with section headings ‘De-
of Theano providing a convenient means of experimenting scripción del caso’, ‘Presentación de caso’, ‘Descripción de
with different combinations of word embeddings. It provides caso clínico’, or ‘Caso clínico’, and ending with the sections
an off-the-shelf neural-based system supporting entity extrac- ‘Bibliografía’ or ‘Referencias’ was selected. In this way we
tion. We train a Long Short Term Memory (LSTM) network retrieved 1, 368, 080 sentences with 86, 851, 275 tokens.

6 SciELO.org 8 https://cloud.google.com/ml-engine/docs/tensorflow/using-tpus
7 https://github.com/zalandoresearch/flair 9 http://www.scielo.org

VOLUME 4, 2016 5

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

TABLE 2. Results of experiments

Sun’s BERT Stoeckel’s Flair Xiong’s BERT Flair_Sc_ext2 (ours) BERT_Sc (ours)
Precision Recall F-score Precision Recall F-score Precision Recall F-score Precision Recall F-score Precision Recall F-score
Overall 90.46% 88.06% 89.24% 90.79% 90.30% 90.52% 91.23% 90.88% 91.05% 91.97% 89.74% 90.84% 89.29% 87.83% 88.55%
Normalizables - - - - - - 94.26% 92.91% 93.58% 95.21% 91.88% 93.46 91.48% 91.67% 91.57%
Proteinas - - - - - - 87.87% 89.41% 88.63% 88.56% 88.36% 88.46% 86.74% 84.52% 85.61%
No Normalizables - - - - - - 100.00% 20.00% 33.33% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

We also used the same corpora for training the BERT

language representation model with the vocabulary size set
to 128000.
Constructed domain-specific Flair embeddings Flair_Sc
are compared with the general pre-trained Flair embeddings
that are a part of the Flair API Flair_G10 . They are trained
on a dump of Spanish Wikipedia dated August 201811 .

G. LANGUAGE MODEL TRAINING

The Flair_Sc LM was trained until the perplexity reached
1.92. The settings used to train word embeddings are: hidden
size 1150, the number of layers 3 with maximum sequence
length 240, mini-batch size 100 and number of epochs equal
to 1000.
The training of Flair_Sc LM was done using 1 GPU FIGURE 1. Training and validation loss and F-score Dependency between the
instance. number of epochs and either loss (top figure) or F1 score (bottom figure) for
The BERT language representation was trained using Flair_Sc_ext2 model.

Tensor Processing Units (TPU) instances in Google Colab

with the number of training steps 1B. TPU is designed to
Indeed, comparing to the best systems’ results, it can be
efficiently scale operations among different machines thus
observed that we are superior in terms of higher precision
making calculations on tensors faster than doing it using
but relatively weaker in terms of recall. Overall, our results
GPU. For storing and uploading weights for training Google
are 0.21% behind the best system of Xiong et al. [38] for this
Cloud persistent storage is required. Moreover, every 8 hours
task.
Google Colab shuts down its server, so it is necessary to be
No_Normalizables entities comprising the minority class
resumed manually. Overall, it took more than 4 days to train
are not captured by our models. Techniques for tackling the
the BERT language representation, substantially longer than
class imbalance should be considered in future experiments
it takes to train Flair_Sc LM.
with sequence labeling architectures.
III. RESULTS
IV. DISCUSSION
The comparative results of experiments are presented in
Number of training epochs Fig. 1 shows an evolution of the
Table 2 where we depict our best Flair-based, and BERT
loss and F-score over number of epochs. It can be seen that
model results:
the loss becomes steady after around 27 epochs and the test
• Flair_Sc_ext2: the extended model is trained using
F-score stabilizes at around the same point. Overall, the test
the custom SciELO Flair embeddings Flair_Sc,
set loss curve resembles the validation set loss curve which
SciELO FastText embeddings, BPE embeddings and
means that the validation set is a good proxy for measuring
character Embeddings;
the model performance.
• BERT_Sc: BERT-based word embeddings are trained
Ablation analysis For our ablation analysis, we explored
on the SciELO corpus. Subsequently, the BERT model
the following additional combinations of word embeddings:
is fine-tuned for the downstream task.
• Flair_G_ext: the model is trained using Spanish general
To compare our results with others, we selected the top
domain Flair embeddings Flair_G, Spanish gen-
results in the challenge leader board and we omit results for
eral FastText embeddings and BPE embeddings;
which no description of the systems were provided.
• Standard_Sc: SciELO FastText embeddings with
For the best model, precision for all types of entities is
subword information property, BPE embeddings and
higher than recall, especially for Normalizables entities. This
character Embeddings are used;
means that while the model is good in determining the correct
• Flair_Sc: based only on custom SciELO Flair em-
cases, it is not as strong at identifying positive examples.
beddings;
10 https://github.com/zalandoresearch/flair/issues/80 • Flair_Sc_ext: the custom SciELO Flair embeddings,
11 https://dumps.wikimedia.org/eswiki/20180801/ general Spanish FastText embeddings and BPE em-
6 VOLUME 4, 2016

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

TABLE 3. Results of ablation analysis

Model name Embedding types Precision Recall F-score

Flair_G_ext General Flair + general FastText + BPE 89.71% 89.47% 89.59%
Standard_Sc SciELO FastText + BPE + char emb 86.90% 86.81% 86.85%
Flair_Sc SciELO Flair 88.91% 88.38% 88.65%
Flair_Sc_ext SciELO Flair +general FastText + BPE 90.95% 89.47% 90.20%
Flair_Sc_ext2 SciELO Flair + SciELO FastText + BPE + char emb 91.97% 89.74% 90.84%

FIGURE 2. Distribution of errors Distribution of errors for short (less than 3 terms) and long (with length more or equal 3 terms) entities.

TABLE 4. Distribution of errors for different models

Short (≤ 2 terms) Long (≥ 3 terms)

Flair_Sc_ext2 Flair_Sc Standard_Sc Flair_Sc_ext2 Flair_Sc Standard_Sc
Misclassified FP 21 28 30 1 0 2
Shorter FP 25 27 22 33 38 31
Longer FP 37 42 58 1 0 1
Intersections FP 0 2 3 0 1 0
No intersections FP 88 84 103 8 4 6
FN 30 50 72 2 1 1

beddings are used. Error analysis For error analysis, we split gold standard
The results of different variations of stacking word embed- entities into 2 groups: short entities with the length less or
dings are shown in Table 3. equal to 2, and long entities with the length greater than or
equal to 3. For the best model Flair_Sc_ext2, the origin
In general, LM-based embeddings lead to better results
and distribution of errors are presented in Fig. 2.
than the standard ones. It can be also seen that the model
enriched with different types of word embeddings gives bet-
It can be seen that the majority of errors are for the short
ter results in terms of precision, recall and F-score. Domain
predicted entities for which there is not even partial overlap
specific word embeddings lead to improvement of results,
with gold standard entities (No intersections false positives
however, they are much smaller in size than general do-
(FP)). Indeed, many biomedical entities are acronyms and
main ones. Augmenting word embeddings with additional
abbreviations which could be easily misclassified based on
subword level embeddings such as FastText, BPE and
casing and length of entities. Interestingly, the second pri-
character embeddings further improves the results.
mary source of errors for short predicted entities is that the
We also experimented with searching concepts in model predicts two entities where the gold standard has a
SNOMED-CT using the Meaning Cloud tool12 , however it single entity (Longer FP). A smaller number of errors are
did not work well, as many concepts for the shared task were related to short gold standard entities which the model fails
annotated based on their synonyms. to detect (false negatives - FN). For long entities, the main
source of error is that predicted entities are shorter than
12 https://www.meaningcloud.com/ required (Shorter FP), contributing nearly 75% of the total
VOLUME 4, 2016 7

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

error. data, with language model pre-training it becomes possible to

In Table 4, we present a comparison of errors among build a high-quality NER system even with this small amount
3 models: the best model Flair_Sc_ext2, the model of annotated data.
which uses only Flair embeddings trained over the target With this aim, we trained domain-specific Spanish lan-
corpus Flair_Sc, and the model based on a set of standard guage models, in particular, Flair and BERT to derive
embeddings Standard_Sc. contextualized word embeddings and applied them to the
Interestingly, the main discrepancies in the number of PharmacoNER biomedical NER data achieving competitive
errors for Flair_Sc model in comparison to the best model results. We showed that domain-specific word embeddings
are related to the larger number of not predicted short entities outperform general embeddings, despite being trained on a
(FN). All other discrepancies in errors for both models vary smaller corpus. Moreover, we demonstrated that stacking
in a range 1-7 in both ways. together word embeddings of different nature can improve
In relation to the best model, the main source of errors for model performance.
Standard_Sc is related as well to the falsely predicted Error analysis has shown that the main source of errors
short entities without intersections with gold standard ones for all models is over-zealous recognition of short entities.
(No intersections FP) with almost 15% more predicted FP. Indeed, biomedical entities are often short and upper-cased
It indicates that the best model utilizing the contextual em- and can be easily mixed up with other abbreviated short
beddings learns the meaning of acronyms, abbreviations and words. Testing the approach by analyzing other Spanish
overall short uppercased words more effectively, assigning health-related texts, such as social media [51], with similar
them biomedical labels with more caution. characteristics (e.g., a large number of abbreviations, lack
This comparison also shows that lower performing models of grammatical structure, punctuation marks, etc.) and oth-
are much worse at detecting the boundaries of short biomedi- ers (e.g., patient oriented terminology not included in any
cal concepts, often predicting longer concepts: 5 more incor- resource, slang words, etc.) could help to cope with these
rectly predicted concepts for the Flair_Sc model and 21 phenomena.
more incorrectly predicted concepts for the Standard_Sc Moreover, standard embedding-based models often fail by
model. detecting long false positive entities or longer versions of
It is interesting to observe that for the long predicted gold standard entities (in particular, for FastText models).
concepts, the absolute numbers and distribution of errors for However, it should be noted that the ability to detect long
the best Flair_Sc_ext2 and Standard_Sc models are entities could be beneficial in particular scenarios.
mostly the same. However, the Flair_Sc model performs One direction for improvement could be more sophisti-
slightly worse in terms of predicting shorter concepts than cated utilization of contextualized embeddings. For example,
the gold standard ones (i.e. predicting three consecutive terms they could be incorporated into state-of-the-art NER archi-
instead of four, etc). tectures such as graph-based NNs or NNs with a dependency
In Table 5, we present two examples of sentences with tree-based attention mechanism to further improve capturing
underlined gold standard and predicted entities. Sentences of long-distance relationships between biomedical entities.
were chosen from the representative groups of the most For handling the imbalance of classes, different strategies
common errors for different models. Here, FP is the shorter such as loss function modification could be applied in future
abbreviation for FP without intersections. It can be observed work.
that the Standard_Sc model in both examples predicted
long entities which were either FP or longer version of
ACKNOWLEDGMENT
gold standard entities. Flair-based models are also often
confusing short upper-cased entities but in fewer cases. We would like to thank anonymous reviewers for their invalu-
Interestingly, in the second example, although both able feedback. We are also thankful to Leonardo Campillos
Flair_Sc and Standard_Sc models have detected who gratefully helped us prepare the analysis on Spanish
‘USA‘ entity as a PROTEIN, the Flair_Sc_ext2 model clinical texts’ peculiarities.
which combines embeddings from both models did not give
this entity a biomedical label. REFERENCES
In terms of the best parameter setting, we did not perform [1] Gonzalez-Agirre A, Marimon M, Intxaurrondo A, Rabal O, Villegas M,
Kralligner M. PharmaCoNER: Pharmacological Substances, Compounds
parameter selection for either the Flair or BERT models; and proteins Named Entity Recognition track. In: Proceedings of the 5th
this might further increase model quality. Workshop on BioNLP Open Shared Tasks; 2019. p. 1–10. Available from:
https://github.com/PlanTL-SANIDAD/SPACCC.
V. CONCLUSION [2] Segura-Bedmar I, Revert R, Martínez P. Detecting drugs and adverse
events from Spanish health social media streams. Proceedings of the 5th
In this work, we have explored the application of transfer International Workshop on Health Text Mining and Information Analysis
learning techniques, in particular, language representation- (Louhi) at EACL 2014. 2014;p. 106–115.
based word embeddings to the problem of extracting biomed- [3] Akhtyamova L, Ignatov A, Cardiff J. A Large-Scale CNN Ensemble
for Medication Safety Analysis. Natural Language Processing and In-
ical entities from 1000 Spanish clinical case narratives. By formation Systems NLDB 2017 Lecture Notes in Computer Science.
leveraging the knowledge from a huge amount of unlabeled 2017;10260.

8 VOLUME 4, 2016

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

TABLE 5. Examples of errors in recognizing biomedical entities by different models

Example Types of errors

Example 1 Correct annotation IgG 317, IgA 1446, IgM 15 mg/dl, cadena ligera libre (CLL, nefelometría Free-Lite®) kappa 4090 ng/ml, lambda 1.
Flair_Sc_ext2 IgG 317, IgA 1446, IgM 15 mg/dl, cadena ligera libre (CLL, nefelometría Free-Lite®) kappa 4090 ng/ml, lambda 1. FP
Flair_Sc IgG 317, IgA 1446, IgM 15 mg/dl, cadena ligera libre (CLL, nefelometría Free-Lite®) kappa 4090 ng/ml, lambda 1. FP
Standard_Sc IgG 317, IgA 1446, IgM 15 mg/dl, cadena ligera libre (CLL, nefelometría Free-Lite®) kappa 4090 ng/ml, lambda 1. FP

Example 2 Correct annotation proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE,
Flair_Sc_ext2 proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, shorter FP
Flair_Sc proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, shorter FP
Standard_Sc proteína S-100 (Dako, L1845, USA, prediluída), neurofilamentos (Biogenex 6670-0154, USA), enolasa neuroespecífica NSE, FP, longer FP

[4] Kim D, Lee J, So CH, Jeon H, Jeong M, Choi Y, et al. A Neural Named In: Proceedings of the 2nd Clinical Natural Language Processing Work-
Entity Recognition and Multi-Type Normalization Tool for Biomedical shop; 2019. p. 124–133. Available from: http://doi.org/10.5281/zenodo.
Text Mining. IEEE Access. 2019;7:73729–73740. Available from: 2542722.
https://ieeexplore.ieee.org/document/8730332/. [19] Akhtyamova L. Named Entity Recognition in Spanish Biomedical Liter-
[5] Jin Q, Dhingra B, Liu Z, Cohen WW, Lu X. PubMedQA: A Dataset for ature: Short Review and Bert Model. In: 2020 26th Conference of Open
Biomedical Research Question Answering. In: Proceedings of the 2019 Innovations Association (FRUCT). IEEE; 2020. p. 1–7. Available from:
Conference on Empirical Methods in Natural Language Processing and https://ieeexplore.ieee.org/document/9087359/.
the 9th International Joint Conference on Natural Language Processing; [20] Eftimov T, Korouš Seljak B, Korošec P. A rule-based named-entity
2019. p. 2567–2577. Available from: http://arxiv.org/abs/1909.06146. recognition method for knowledge extraction of evidence-based dietary
[6] Mikolov T, Chen K, Corrado G, Dean J. Efficient Estimation of Word recommendations. PLoS ONE. 2017;12(6). Available from: https://doi.
Representations in Vector Space. In: arXiv preprint arXiv:1301.3781; org/10.1371/journal.pone.0179488.
2013. Available from: http://arxiv.org/abs/1301.3781. [21] Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB,
[7] Pennington J, Socher R, Manning CD. GloVe: Global Vectors for Word et al. Large-scale biomedical concept recognition: An evaluation of current
Representation. Proceedings of the 2014 conference on empirical methods automatic annotators and their parameters. BMC Bioinformatics. 2014
in natural language processing (EMNLP). 2014;Available from: https:// 2;15(1).
nlp.stanford.edu/pubs/glove.pdf. [22] Marimon M, Gonzalez-Agirre A, Intxaurrondo A, Rodríguez H, Antonio
[8] Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching Word Vectors with Lopez Martin J, Villegas M, et al. Automatic De-Identification of Medical
Subword Information. Transactions of the Association for Computational Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods
Linguistics. 2017 7;5(2307-387X):135–146. Available from: http://arxiv. and Evaluation of Results. In: Proceedings of the Iberian Languages
org/abs/1607.04606. Evaluation Forum (IberLEF 2019); 2019. p. 618–638. Available from:
[9] Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning https://github.com/PlanTL-SANIDAD.
with word embeddings improves biomedical named entity recognition. [23] Perez N, García-Sardiña L, Serras M, Pozo AD. Vicomtech at MED-
Bioinformatics. 2017 7;33(14):i37–i48. Available from: https://academic. DOCAN: Medical Document Anonymization. In: Proceedings of the
oup.com/bioinformatics/article/33/14/i37/3953940. Iberian Languages Evaluation Forum (IberLEF 2019); 2019. p. 698–703.
[10] Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep Available from: https://github.com/PlanTL-SANIDAD/SPACCC.
contextualized word representations. arXiv preprint arXiv:180205365. [24] López P, D MC, Alfonso Ureña-López L, Teresa Mart M. Anonymization
2018 2;Available from: http://arxiv.org/abs/1802.05365. of Clinical Reports in Spanish: a Hybrid Method Based on Machine
[11] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Learning and Rules. In: Proceedings of the Iberian Languages Evaluation
Deep Bidirectional Transformers for Language Understanding. In: arXiv Forum (IberLEF 2019); 2019. p. 688–695. Available from: https://catalog.
preprint arXiv:1810.04805; 2018. Available from: http://arxiv.org/abs/ ldc.upenn.edu/LDC2018T01.
1810.04805.
[25] Lee KJ, Hwang YS, Kim S, Rim HC. Biomedical named entity recognition
[12] Han X, Eisenstein J. Unsupervised Domain Adaptation of Contextualized
using two-phase model based on SVMs. Journal of Biomedical Informat-
Embeddings for Sequence Labeling. In: Proceedings of the 2019 Confer-
ics. 2004 12;37(6):436–447.
ence on Empirical Methods in Natural Language Processing and the 9th
[26] Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus–a semantically
International Joint Conference on Natural Language Processing; 2019. p.
annotated corpus for bio-textmining. Bioinformatics. 2003 7;19(Suppl
4237–4247. Available from: https://github.com/xhan77/.
1):i180–i182. Available from: https://academic.oup.com/bioinformatics/
[13] Howard J, Ruder S. Universal Language Model Fine-tuning for Text
article-lookup/doi/10.1093/bioinformatics/btg1023.
Classification. In: Proceedings of the 56th Annual Meeting of the Asso-
ciation for Computational Linguistics; 2018. p. 328–339. Available from: [27] Nguyen TVT, Moschitti A, Riccardi G. Kernel-based reranking for named-
http://nlp.fast.ai/ulmfit. entity extraction. In: Proceedings of the 23rd International Conference
on Computational Linguistics: Posters. Association for Computational
[14] Dai AM, Le QV. Semi-supervised Sequence Learning. In: Cortes
Linguistics (ACL); 2010. p. 901–909. Available from: https://dl.acm.org/
C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Ad-
citation.cfm?id=1944670.
vances in Neural Information Processing Systems 28. Curran Associates,
Inc.; 2015. p. 3079–3087. Available from: http://papers.nips.cc/paper/ [28] Collins M. Ranking algorithms for named-entity extraction. In: Pro-
5949-semi-supervised-sequence-learning.pdf. ceedings of the 40th Annual Meeting on Association for Computational
[15] Radford A. Improving Language Understanding by Generative Pre- Linguistics. Association for Computational Linguistics (ACL); 2001. p.
Training. In: URL https://s3-us-west-2. amazonaws. com/openai- 489–496.
assets/researchcovers/languageunsupervised/language understanding [29] Björne J, Salakoski T. Generalizing Biomedical Event Extraction. In:
paper. pdf.; 2018. Available from: https://www.semanticscholar.org/ Proceedings of BioNLP Shared Task 2011 Workshop; 2011. p. 183–191.
paper/Improving-Language-Understanding-by-Generative-Radford/ Available from: http://svmlight.joachims.org/svm_.
cd18800a0fe0b668a1cc19f2ec95b5003d0a5035. [30] Takeuchi K, Collier N. Bio-medical entity extraction using support vector
[16] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. machines. Artificial Intelligence in Medicine. 2005;33(2):125–137.
Attention Is All You Need. In Advances in neural information processing [31] Isozaki H, Kazawa H. Efficient support vector classifiers for named entity
systems. 2017 6;p. 5998–6008. Available from: http://arxiv.org/abs/1706. recognition. Association for Computational Linguistics (ACL); 2002. p.
03762. 1–7.
[17] Akbik A, Blythe D, Vollgraf R. Contextual String Embeddings for [32] Patra R, Saha SK. A kernel-based approach for biomedical named entity
Sequence Labeling. In: COLING; 2018. Available from: https://github. recognition. The Scientific World Journal. 2013;2013.
com/zalandoresearch/flair. [33] Li D, Huang L, Ji H, Han J. Biomedical Event Extraction Based on
[18] Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé Knowledge-driven Tree-LSTM. In: Proceedings of NAACL-HLT 2019.
J. Medical Word Embeddings for Spanish: Development and Evaluation. Association for Computational Linguistics; 2019. p. 1421–1430.

VOLUME 4, 2016 9

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

[34] Simeonova L, Simov K, Osenova P, Nakov P. A Morpho-Syntactically LILIYA AKHTYAMOVA (ORCID: 0000-0003-
Informed LSTM-CRF Model for Named Entity Recognition. In: arXiv 4338-1483) Department of Computing, Techno-
preprint arXiv:1908.10261; 2019. p. preprint. Available from: http:// logical University Dublin, Dublin, Ireland.
github.com/lilia-simeonova/. Liliya Akhtyamova received the M.S. degree
[35] Sharma S, Daniel R. BioFLAIR: Pretrained Pooled Contextual- in Applied Mathematics and Physics from the
ized Embeddings for Biomedical Sequence Labeling Tasks. arXiv Moscow Institute of Physics and Technology
preprint arXiv:190805760. 2019 8;Available from: http://arxiv.org/abs/ (State University), Dolgoprudniy, Moscow obl.,
1908.05760.
Russia, in 2017. She is currently pursuing the
[36] Miller T, Geva A, Dligach D. Extracting Adverse Drug Event Information
Ph.D. degree in Information Technology at the
with Minimal Engineering. In: Proceedings of the 2nd Clinical Natural
Language Processing Workshop; 2019. p. 22–27. Available from: https: Technological University Dublin, Dublin, Ireland.
//www.aclweb.org/anthology/W19-1903. In Spring 2019, she did the traineeship under the Erasmus grant at the
[37] Basaldella M, Collier N. BioReddit: Word Embeddings for User- Carlos III University of Madrid, Madrid, Spain. This paper resulted from
Generated Biomedical NLP. In: Proceedings of the Tenth International the collaboration work performed at that time. Her research interests include
Workshop on Health Text Mining and Information Analysis (LOUHI biomedical text mining, named entity recognition, deep learning.
2019). Association for Computational Linguistics (ACL); 2019. p. 34–38.
[38] Xiong Y, Shen Y, Huang Y, Chen S, Tang B, Wang X, et al. A Deep
Learning-Based System for PharmaCoNER. In: Proceedings of the 5th
Workshop on BioNLP Open Shared Tasks; 2019. p. 33–37. Available
from: https://github.com/PlanTL-SANIDAD/SPACCC_POS-.
[39] Stoeckel M, Hemati W, Mehler A. When Specialization Helps: Using
Pooled Contextualized Embeddings to Detect Chemical and Biomedical
Entities in Spanish. In: Proceedings of the 5th Workshop on BioNLP PALOMA MARTINEZ (ORCID: 0000-0003-
Open Shared Tasks; 2019. p. 11–15. Available from: www.github.com/ 3013-3771) Computer Science and Engineering
zalandoresearch/flair. Department, University Carlos III of Madrid.
[40] Sun C, Yang Z. Transfer Learning in Biomedical Named Entity Recogni- Paloma Martinez received the degree in Com-
tion: An Evaluation of BERT in the PharmaCoNER task. In: Proceedings puter Science and the Ph.D. degree in Com-
of the 5th Workshop on BioNLP Open Shared Tasks; 2019. p. 100–104. puter Science from the Universidad Politécnica
[41] Rivera Zavala RM, Martínez P. Deep neural model with enhanced em- de Madrid (Spain) in 1992 and 1998, respec-
beddings for pharmaceutical and chemical entities recognition in Spanish tively. She is the Head of the Human Language
clinical text. In: Proceedings of the 5th Workshop on BioNLP Open Shared and Accessibility Technologies (HULAT) in the
Tasks; 2019. p. 38–46. Available from: https://ufal.mff. Computer Science and Engineering Department,
[42] Trask A, Michalak P, Liu J. sense2vec - A Fast and Accurate Method University Carlos III of Madrid.
for Word Sense Disambiguation In Neural Word Embeddings. arXiv Her research interests are human language technologies, with the focus
preprint arXiv:151106388. 2015 11;Available from: http://arxiv.org/abs/ on information extraction in the biomedical domain, and web accessibility.
1511.06388.
She is co-author of more than 40 articles in indexed journals and more than
[43] Lange L, Adel H, Strötgen J. Closing the Gap: Joint De-Identification and
a hundred international conference contributions. She has been principal
Concept Extraction in the Clinical Domain. arXiv preprint: 200509397.
2020 5;Available from: https://arxiv.org/abs/2005.09397. investigator and participated in over 40 national and international research
[44] Sánchez León F, Ledesma AG. Annotating and normalizing biomedical projects.
NEs with limited knowledge *. arXiv preprint: 191209152. 2019;Available Currently, she is member of the Spanish Society for Natural Language
from: https://arxiv.org/abs/1912.09152v1. Processing (SEPLN) and member of Dynamization Network for Activities
[45] Mikolov T, Chen K, Corrado G, Dean J. Distributed Representations on Natural Language Processing Technologies. She is a collaborator of the
of Words and Phrases and their Compositionality. Advances in neural Spanish Center of Captioning and Audiodescription (CESyA).
information processing systems. 2013;p. 3111–3119. Available from: +info: http://hulat.inf.uc3m.es/pmf.
https://arxiv.org/pdf/1310.4546.pdf.
[46] Akbik A, Bergmann T, Vollgraf R. Pooled Contextualized Embeddings
for Named Entity Recognition. In: NAACL; 2019. Available from: https:
//github.com/zalandoresearch/flair.
[47] Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a
pre-trained biomedical language representation model for biomedical text
mining. Bioinformatics. 2019;(btz682). Available from: https://github.
com/dmis-lab/biobert. KARIN VERSPOOR (ORCID: 0000-0002-
[48] Heinzerling B, Strube M. BPEmb: Tokenization-free Pre-trained Subword 8661-1544) is a Professor in the School of Com-
Embeddings in 275 Languages. In: Proceedings of the Eleventh Inter- puting and Information Systems at the University
national Conference on Language Resources and Evaluation ({LREC}- of Melbourne.
2018); 2018. p. 18–1473. Available from: https://aclweb.org/anthology/ She received a BA in Computer Science and
papers/L/L18/L18-1473/. Cognitive Sciences from Rice University in 1993,
[49] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural the MSc degree in Cognitive Science and Natural
Architectures for Named Entity Recognition. In: Proceedings of the Language from the University of Edinburgh (UK)
2016 Conference of the North American Chapter of the Association for in 1994, and a PhD in Cognitive Science from the
Computational Linguistics: Human Language Technologies. Stroudsburg, University of Edinburgh (UK) in 1997.
PA, USA: Association for Computational Linguistics; 2016. p. 260–270. After a post-doc at Macquarie University in Sydney, Australia, she spent
Available from: http://aclweb.org/anthology/N16-1030.
5 years in artificial intelligence start-ups, and then held research roles
[50] Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields:
at Los Alamos National Laboratory, the University of Colorado School
Probabilistic Models for Segmenting and Labeling Sequence Data. In: Pro-
ceedings of the Eighteenth International Conference on Machine Learning. of Medicine, National Information Communications Technology Australia
ICML ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; (NICTA), and finally joined the University of Melbourne.
2001. p. 282–289. Available from: http://dl.acm.org/citation.cfm?id= This work was completed while she was a long-term visitor at The
645530.655813. University of Carlos III Madrid (Spain), hosted by Paloma Martinez and
[51] Martínez P, Martínez JL, Segura-Bedmar I, Moreno-Schneider J, Luna A, with the support of the University of Melbourne. Her research focuses on
Revert R. Turning user generated health-related content into actionable biomedical text mining and clinical data analysis.
knowledge through text analytics services. Computers in Industry. 2016
5;78:43–56.

10 VOLUME 4, 2016

Akhtyamova et al.: Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

JOHN CARDIFF (ORCID: 0000-0003-1863-

4557) Department of Computing, Technological
University Dublin, Dublin, Ireland.
John Cardiff received the degree of BA Hons
in Computer Science Trinity College Dublin, Ire-
land, in 1986 and the PhD from the University of
Queensland, Australia, in 1990. He has over 25
years lecturing and research experience and cur-
rently is a full time lecturer at the Technological
University of Dublin (Tallaght Campus), Ireland.
He has previously held positions in the Department of Computer Science,
Trinity College Dublin, and in the University of Queensland, Australia. He
has served as Visiting Professor at the Technical University of Valencia,
Spain and Universitat Jaume I, Spain.
His research interests include Natural Language Processing and Social
Media Analysis. He is author or co-author of over 80 scientific papers.

VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

Author Postprint
No ratings yet
Author Postprint
8 pages
Iberlef TestLink
No ratings yet
Iberlef TestLink
8 pages
Wa0067.
No ratings yet
Wa0067.
10 pages
Improving Large Language Models For Clinical Named Entity Recognition Via Prompt Engineering
No ratings yet
Improving Large Language Models For Clinical Named Entity Recognition Via Prompt Engineering
10 pages
Making Medical Experts Fit4ner: Transforming Domain Knowledge Through Machine Learning-Based Named Entity Recognition
No ratings yet
Making Medical Experts Fit4ner: Transforming Domain Knowledge Through Machine Learning-Based Named Entity Recognition
20 pages
Final Synopsis
No ratings yet
Final Synopsis
15 pages
Báez Et Al - 2020 - The Chilean Waiting List Corpus
No ratings yet
Báez Et Al - 2020 - The Chilean Waiting List Corpus
10 pages
Named Entity Recognition Using Ensemble
No ratings yet
Named Entity Recognition Using Ensemble
5 pages
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
No ratings yet
Med7: A Transferable Clinical Natural Language Processing Model For Electronic Health Records
17 pages
Clincal Name Entity
No ratings yet
Clincal Name Entity
8 pages
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
No ratings yet
Future of AI in Biomedicine and Biotechnology - (Chapter 12 Shaping The Future of Healthcare With BERT in Clinical Text... )
20 pages
We 1
No ratings yet
We 1
18 pages
Using Natural Language Processing To Evaluate The Impact of Specialized Transformers Models On Medical Domain Tasks
No ratings yet
Using Natural Language Processing To Evaluate The Impact of Specialized Transformers Models On Medical Domain Tasks
9 pages
A Large Language Model For Electronic Health Records: Authors: Xi Yang
No ratings yet
A Large Language Model For Electronic Health Records: Authors: Xi Yang
32 pages
Multi-Task Learning For Chinese Clinical Named Entity Recognition With External Knowledge
No ratings yet
Multi-Task Learning For Chinese Clinical Named Entity Recognition With External Knowledge
11 pages
AltibbiVec A Word Embedding Model For Medical and Health Applications in The Arabic Language
No ratings yet
AltibbiVec A Word Embedding Model For Medical and Health Applications in The Arabic Language
14 pages
Yang Et. Al (2022) - s41746-022-00742-2
No ratings yet
Yang Et. Al (2022) - s41746-022-00742-2
9 pages
Reference
No ratings yet
Reference
18 pages
Large Language Models For Clinical Text Cleansing Enhance Medical Concept Normalization
No ratings yet
Large Language Models For Clinical Text Cleansing Enhance Medical Concept Normalization
10 pages
Biomedical Named Entity Recognition Final
No ratings yet
Biomedical Named Entity Recognition Final
27 pages
A Survey On Named Entity Recognition
No ratings yet
A Survey On Named Entity Recognition
12 pages
Paper 007
No ratings yet
Paper 007
11 pages
Semi-Supervised Natural Language Processing Approach For Fine-Grained Classification of Medical Reports
No ratings yet
Semi-Supervised Natural Language Processing Approach For Fine-Grained Classification of Medical Reports
4 pages
GatorTron A Large Language Model For Clinical Natural Language Processing
No ratings yet
GatorTron A Large Language Model For Clinical Natural Language Processing
24 pages
Vimedner Lre
No ratings yet
Vimedner Lre
25 pages
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of Bert and Elmo On Ten Benchmarking Datasets
No ratings yet
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of Bert and Elmo On Ten Benchmarking Datasets
8 pages
Zhang Et Al
No ratings yet
Zhang Et Al
11 pages
Biomedical Semantic Embeddings Using Hybrid Sentences To Construct Biomedical Word Embeddings and Its Applications
No ratings yet
Biomedical Semantic Embeddings Using Hybrid Sentences To Construct Biomedical Word Embeddings and Its Applications
9 pages
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
No ratings yet
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
8 pages
NLP ABioNER - A - BERT-Based - Model - For - Arabic - Biomedical - N
No ratings yet
NLP ABioNER - A - BERT-Based - Model - For - Arabic - Biomedical - N
6 pages
Zero-Shot Clinical Entity Recognition Using Chatgpt
No ratings yet
Zero-Shot Clinical Entity Recognition Using Chatgpt
7 pages
Direct
No ratings yet
Direct
26 pages
Naama Padam
No ratings yet
Naama Padam
16 pages
FINAL
No ratings yet
FINAL
16 pages
Annexe 3.1 - Article Hybrid Medical Named Entity Recognition... (2023)
No ratings yet
Annexe 3.1 - Article Hybrid Medical Named Entity Recognition... (2023)
31 pages
Adding Reference
No ratings yet
Adding Reference
4 pages
COVID-19 Biomedical NER Dataset
No ratings yet
COVID-19 Biomedical NER Dataset
3 pages
COMP3220 Assignment 3 Report
No ratings yet
COMP3220 Assignment 3 Report
10 pages
NLP QABots for Medical Data Extraction
No ratings yet
NLP QABots for Medical Data Extraction
12 pages
2205.12689-LLM Weak SuperVision Distillation
No ratings yet
2205.12689-LLM Weak SuperVision Distillation
26 pages
Yeni
No ratings yet
Yeni
7 pages
Large Language Models Finetuned With Diverse Medical Data and Comprehensive Evaluation
No ratings yet
Large Language Models Finetuned With Diverse Medical Data and Comprehensive Evaluation
11 pages
Word Embedding For Understanding Natural Language: A Survey: Yang Li Tao Yang
No ratings yet
Word Embedding For Understanding Natural Language: A Survey: Yang Li Tao Yang
13 pages
5 Word Embeddingfor Understanding Natural Language ASurvey 1
No ratings yet
5 Word Embeddingfor Understanding Natural Language ASurvey 1
26 pages
A Machine Learning Approach For Identifying Disease-Treatment Relations in Short Texts
No ratings yet
A Machine Learning Approach For Identifying Disease-Treatment Relations in Short Texts
7 pages
Diagnostic Reasoning Prompts Reveal The Potential For Large
No ratings yet
Diagnostic Reasoning Prompts Reveal The Potential For Large
7 pages
Large Language Models in Health Care: Development, Applications, and Challenges
No ratings yet
Large Language Models in Health Care: Development, Applications, and Challenges
9 pages
05 AIHC Exp05
No ratings yet
05 AIHC Exp05
6 pages
Medinform 2018 4 E50
No ratings yet
Medinform 2018 4 E50
14 pages
Jurnal 5 New
No ratings yet
Jurnal 5 New
33 pages
Natural Language Processing in Critical Care: Opportunities, Challenges, and Future Directions
No ratings yet
Natural Language Processing in Critical Care: Opportunities, Challenges, and Future Directions
5 pages
Bidirectional LSTM-CRF For Biomedical Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Biomedical Named Entity Recognition
4 pages
GenAI NLP Project
No ratings yet
GenAI NLP Project
20 pages
Clinical Concept Annotation With Contextual Word e
No ratings yet
Clinical Concept Annotation With Contextual Word e
31 pages
Approach For ML
No ratings yet
Approach For ML
4 pages
Article
No ratings yet
Article
24 pages
医疗联合模型
No ratings yet
医疗联合模型
19 pages
Infusing Machine Learning and Computational Linguistics Into Clinical Notes
No ratings yet
Infusing Machine Learning and Computational Linguistics Into Clinical Notes
10 pages
Mining and Classifying Medical Documents
No ratings yet
Mining and Classifying Medical Documents
4 pages
Automated HTML Data Extraction
No ratings yet
Automated HTML Data Extraction
2 pages
Orion: Shortest Path Estimation For Large Social Graphs
No ratings yet
Orion: Shortest Path Estimation For Large Social Graphs
9 pages
Mele Et Al. - 2020 - Topic Propagation in Conversational Search
No ratings yet
Mele Et Al. - 2020 - Topic Propagation in Conversational Search
4 pages
Crema Catalana Receta
No ratings yet
Crema Catalana Receta
1 page
Memory-Efficient Fast Shortest Path Estimation in Large Social Networks
No ratings yet
Memory-Efficient Fast Shortest Path Estimation in Large Social Networks
10 pages
Congestion Control Algorithm Interactions
No ratings yet
Congestion Control Algorithm Interactions
8 pages
BBR Congestion Control Paper
No ratings yet
BBR Congestion Control Paper
37 pages
Primary Curriculum Framework 2018
No ratings yet
Primary Curriculum Framework 2018
44 pages
Communication Is Life Blood of A Business Organization
100% (5)
Communication Is Life Blood of A Business Organization
7 pages
A Brief History of English
85% (13)
A Brief History of English
26 pages
Communication: Farah Aliya Fadhila Thariq Ahmad G
No ratings yet
Communication: Farah Aliya Fadhila Thariq Ahmad G
10 pages
English Question Tags Guide
No ratings yet
English Question Tags Guide
5 pages
1.1 Assessing Communication Skills Online PDF
No ratings yet
1.1 Assessing Communication Skills Online PDF
3 pages
An International Perspective On Language Policies, Practices and Proficiencies by Cunningham - Hatoss
No ratings yet
An International Perspective On Language Policies, Practices and Proficiencies by Cunningham - Hatoss
420 pages
NuMen Submissions
No ratings yet
NuMen Submissions
8 pages
Kapampangan Dialects
100% (4)
Kapampangan Dialects
29 pages
A Reflection Paper of Different Contexts of Language Learning Summary
No ratings yet
A Reflection Paper of Different Contexts of Language Learning Summary
2 pages
Grammar Part of Speech
No ratings yet
Grammar Part of Speech
98 pages
English III Text Book E5 ABAC
No ratings yet
English III Text Book E5 ABAC
109 pages
A Letter From An English Friend Official
No ratings yet
A Letter From An English Friend Official
4 pages
English 7 Lesson: Figures of Speech
No ratings yet
English 7 Lesson: Figures of Speech
7 pages
Lesson Plan Cami
No ratings yet
Lesson Plan Cami
4 pages
Understanding Noun Phrases
100% (1)
Understanding Noun Phrases
28 pages
Errors in English
0% (1)
Errors in English
20 pages
Personal Pronouns in Tamil and Dravidian
No ratings yet
Personal Pronouns in Tamil and Dravidian
5 pages
Language Teaching Styles at Blitar
No ratings yet
Language Teaching Styles at Blitar
16 pages
Mcm301 Collection of Old Papers
100% (2)
Mcm301 Collection of Old Papers
53 pages
Personality Adjectives Lesson Plan
100% (4)
Personality Adjectives Lesson Plan
5 pages
Food and Medicine: Yogi Hale Hendlin Jonathan Hope
No ratings yet
Food and Medicine: Yogi Hale Hendlin Jonathan Hope
197 pages
Poster Making Grading Rubric
No ratings yet
Poster Making Grading Rubric
1 page
English Phonetics for Learners
No ratings yet
English Phonetics for Learners
6 pages
Ielts - Course Objectives.2
100% (1)
Ielts - Course Objectives.2
3 pages
Instrumen Saringan LBI Menulis Tahun 4
No ratings yet
Instrumen Saringan LBI Menulis Tahun 4
13 pages
211 232milton
No ratings yet
211 232milton
22 pages
Syllabus ESP
No ratings yet
Syllabus ESP
33 pages
Student Notes: Reading Grammar Listening Speaking Writing
No ratings yet
Student Notes: Reading Grammar Listening Speaking Writing
9 pages
Managing Pair and Group Work - Wajnryb
No ratings yet
Managing Pair and Group Work - Wajnryb
3 pages