Paper 2
Paper 2
A R T I C L E I N F O A B S T R A C T
Keywords: Public health practitioners and researchers have used traditional medical databases to study and understand
Public health public health for a long time. Recently, social media data, particularly Twitter, has seen some use for public
Syndromic surveillance health purposes. Every large technological development in history has had an impact on the behaviour of society.
Pharmacovigilance
The advent of the internet and social media is no different. Social media creates public streams of communi
Event forecasting
Disease tracking
cation, and scientists are starting to understand that such data can provide some level of access into the people’s
opinions and situations. As such, this paper aims to review and synthesize the literature on Twitter applications
for public health, highlighting current research and products in practice. A scoping review methodology was
employed and four leading health, computer science and cross-disciplinary databases were searched. A total of
755 articles were retreived, 92 of which met the criteria for review. From the reviewed literature, six domains for
the application of Twitter to public health were identified: (i) Surveillance; (ii) Event Detection; (iii) Pharmaco
vigilance; (iv) Forecasting; (v) Disease Tracking; and (vi) Geographic Identification. From our review, we were able to
obtain a clear picture of the use of Twitter for public health. We gained insights into interesting observations such
as how the popularity of different domains changed with time, the diseases and conditions studied and the
different approaches to understanding each disease, which algorithms and techniques were popular with each
domain, and more.
1. Introduction diseases and other health outcomes, preventing and controlling diseases
and guiding healthcare activities. Emergency department attendances or
Surveillance, described by the World Health Organisation (WHO) as general practitioner (GP, family doctor) consultations are some of the
‘‘the cornerstone of public health security’’ [1], is aimed at the detection sources traditionally used to track specific syndromes such as
of elevated disease and death rates, implementation of control measures influenza-like illnesses (ILI). With the proliferation of the internet and
and reporting to the WHO of any event that may constitute a public the advent of modern technology, potential new data sources present
health emergency or international concern. Syndromic surveillance can themselves. In recent years, researchers have recognized that social
be described as the real-time (or near real-time) collection, analysis, media platforms, such as Twitter and Facebook, could also provide data
interpretation, and dissemination of health-related data, to enable the about national-level health and behaviour [4]. Among these social
early identification of the impact (or absence of impact) of potential media platforms, Twitter offers a unique and potentially powerful data
human or veterinary public health threats that require effective public source due to its ease of access, real-time nature and richness in detail. In
health action [2]. The task of syndromic surveillance is an undertaking this paper, we look towards Twitter with the aim of investigating and
motivated by the notion of public health. Public health has been defined assessing its utility as a public health tool by performing a scoping re
as the science and art of preventing disease, prolonging life and pro view on the subject. While we seek to review the literature of Public
moting human health through organized efforts and informed choices of health research making use of Twitter, our interest in such literature is
society, organizations, public and private, communities and individuals limited to research concerning the monitoring, detection and forecasting
[3]. In this sense, the concept of health encompasses the physical, of public health conditions. We are not interested in social science
emotional and social well-being. Historically, public health practitioners research investigating the use of Twitter for recruitment or public
have used data from multiple sources for measuring the burden of awareness and dissemination of public health information. We are
* Corresponding author.
E-mail address: o.edo-osagie@uea.ac.uk (O. Edo-Osagie).
https://doi.org/10.1016/j.compbiomed.2020.103770
Received 12 November 2019; Received in revised form 1 April 2020; Accepted 17 April 2020
Available online 16 May 2020
0010-4825/© 2020 Elsevier Ltd. All rights reserved.
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
similarly not interested in research concerned with opinion mining to 2.2. Study selection
understand public opinion on public health issues.A scoping review such
as ours is pertinent as there exist no broad and recent evidence-reviews In accordance to best practice for systematic reviews and meta-
on the use of Twitter data for health research purposes. Wargon et al. [5] analysis, we applied the guidelines for Preferred Reporting Items for
performed a systematic review on syndromic surveillance models used Systematic Reviews and Meta-Analysis (PRISMA) [10] to select studies
in forecasting emergency department visits, however, only 9 studies for inclusion in the analysis. The flowchart for PRISMA that corresponds
were found and none of them made use of Twitter or any social media. to our review is shown in Fig. 1.
Subsequently, Charles-Smithe et al. [6] carried out a systematic review 754 research articles were returned by our search and 1 paper was
of the use of social media (not limited to Twitter) specifically for disease added from the bibliographic listings of relevant retrieved papers. Of
surveillance and outbreak management. Sinnenberg et al. performed these 755 articles, we found 550 to be unique. We then drew up a list of
another systematic review looking at Twitter as a tool for health criteria for inclusion and exclusion of articles in our review similar to
research [7]. Their systematic review encompassed research in both the those used by Shatte et al. [11]. These criteria are shown in Table 1. In
sciences and social sciences. We seek to carry out a scoping review in short, articles were included if all the following criteria were met: (i) the
order to map the broad area of Twitter for public health research as well article reported on a method or application of Twitter data to address a
as to produce an updated review containing more recent studies carried public health issue; (ii) the article evaluated the performance of the
out since the above reviews were published. Hence, our research ques statistical or machine learning technique used in drawing utility from
tion is: ‘‘What is known from the existing literature about the use of the Twitter data; (iii) the article was published in a peer-reviewed
Twitter data in the context of monitoring, detection and forecasting of publication and (iv) the article was available in English. Articles were
public health conditions?’‘. We are particularly interested in the type of excluded if any of the following criteria were met: (i) the article did not
conditions/illnesses being studied; in the sources of data being used; in report an original contribution (e.g. review papers or articles com
the data analysis techniques being applied; and in the geographical and menting or speculating on the state or future of such research); (ii) the
time trends of such studies. (see Tables 3–7) article was focused on the use of Twitter for public health in the context
We deliver a summary of what has been done so far, which will of recruitment and outreach, public awareness and communication,
enable researchers to quickly and efficiently understand this field in information dissemination or opinion mining; (iii) the article did not
terms of the volume, nature and characteristics of the primary research make known the statistical or machine learning technique being used;
undertaken and any gaps in research that may need prompt attention. (iv) the full text of the article was not available (e.g. conference ab
Such evidence is particularly necessary in new but fast moving areas of stracts). Guided by our inclusion and exclusion criteria, we identified
research such as analysis of Twitter data for health applications. and selected 92 articles to be included for the review (see Table 2).
A scoping review methodology was chosen to achieve our goal of The focus of our review was to get an exploratory map of the key
investigating the state of Twitter applications in the field of public problems and concepts being tackled in the public health space through
health research, our research question. The scoping review is defined by the use of Twitter and the techniques being used. To this effect, for each
Arskey and O’Malley [8] as a study that aims ‘‘to map rapidly the key article in our review, data was collected on (i) the aim of the research (ii)
concepts underpinning a research area and the main sources and types of the disease or illness of focus (iii) sources of data for the study (iv)
evidence available, and can be undertaken as stand-alone projects in statistical or machine learning algorithms and methods used (v) the
their own right, especially where an area is complex’‘. For our scoping country for which the study was carried out (vi) the year in which the
review, we made use of the Arksey and O’Malley framework which study was carried out. To analyse the collected information, we used a
adopts a rigorous process of transparency, enabling replication of the narrative review synthesis to capture the broad range of research
search strategy and increasing the reliability of the study findings. As studying Twitter for public health in our scoping review.
Arksey and O’Malley [8] explain, the method consists of a number of
stages such as: identifying the research question; identifying relevant 3. Results
studies; study selection; charting the data and collating, summarizing
and reporting the results (i.e. analysis). We elaborate on specific appli 3.1. Study characteristics
cation of the method to our scenario next.
As explained in section 2.2, the search strategies identified 755 ar
2.1. Search strategy to identify relevant studies ticles, with 92 of these articles meeting the criteria for inclusion in this
review. The mode publication year for articles was 2017 with a range of
To gain a broad coverage of the available literature, the general 2011–2019.19 countries were represented in the studies, with the top 5
terms ‘‘Twitter’’ and ‘‘Public Health’’ were used as search keywords. We countries being the United States of America (US), United Kingdom (UK),
chose these two keywords as ‘‘Twitter’’ covers every discussion of the Canada, India and China. See Fig. 3 for a breakdown of study activity by
Twitter platform, and used together with ‘‘Public Health’’ covers all country.
mention of Twitter in a health context. As our work is multidisciplinary The use of Twitter data was evident for a varied number of different
in that it spans multiple fields, we conducted our search in both health diseases and health conditions. We observed a range of applications
and Information Technology (IT) databases. First, we performed a dealing with physical health and illnesses (n ¼ 82) [e.g. influenza-like
literature search in the health/medical database PubMed. Next, we illnesses (ILIs), adverse drug events and reactions, sexually trans
searched the IT databases IEEE Xplore and the ACM Digital Library. mitted diseases, food-borne illneses], mental health (n ¼ 6) [e.g. suicide
Finally, we searched a general database that indexed both fields, Scopus. and depression], natural disasters and environmental issues (n ¼ 5) [e.g.
Our searches were refined such that we only included research articles earthquakes, heat waves, air pollution] and social issues (n ¼ 8) [e.g.
which were peer-reviewed and in English. We also limited our search to drug abuse, smoking, alcoholism]. We examined the subjects of the
only return results within the date range of January 2009 and March studies for trends in Twitter applications. We analysed and plotted the
2019, which was when the search was carried out. We started our search three most studied diseases for each year. Fig. 4 shows the result of this
from 2009 because of the highly influential Google Flu Trends paper analysis. Taking a closer look at the diseases, conditions and public
published that year which inspired and kickstarted the use of social health phenomena studied using Twitter data, we observed ILIs to be the
media as a data source for public health research [9]. most common. The next most common subject of public health research
2
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
Fig. 1. PRISMA flow diagram for the identification and selection of studies.
Table 1
Inclusion and exclusion criteria.
Criterion Inclusion Exclusion
using Twitter were drug abuse and adverse drug events and/or reactions Networks (CNNs), word and document embeddings], statistical modelling
(ADE/R). Furthermore, we observed a general rise in the quantity of and analysis (n ¼ 12) [e.g. correclation analysis, partial differntial
research into the use of Twitter for public health. Research activity ap equation (PDE), TRAP] and time series analysis (n ¼ 7) [e.g. Autore
pears to have peaked in 2016 but seems to be on the rise from 2018. As gressive Integrated Moving Average (ARIMA), time-series Susceptible-
this scoping review looks at studies up until March 2019, the data for Infected-Recovered (TSIR) model]. The average number of Tweets used
2019 is incomplete. This limitation is due to the fact that this review can in the reviewed studies was roughly twenty thousand. A closer look at
only investigate studies until the time of its writing, which happened to the research towards Twitter use for public health revealed that the SVM
be early in the year. was a popular tool in this research field. We hypothesize that this is due
A myriad of statistical and machine learning techniques were used in to the SVM’s popularity and strength in text classification problems
the analysis of Twitter data for public health (see Fig. 2). Most studies [12]. We also analysed the surveyed studies to find out which statistical
implemented just one technique (n ¼ 54) but some others made use of a or machine learning algorithms were popular, as well as if and how this
mix of methods and techniques (n ¼ 38). The articles made use of a might have shifted over time. Fig. 5 shows a plot of the most used al
range of statistical and machine learning techniques including supervised gorithms for each year covered in this review. Lexicon-based analysis
learning (n ¼ 70) [e.g.Support Vector Machine (SVM), naive bayes, de proved popular between 2012 until 2014. After this, Bayesian learning
cision trees, logistic regression], unsupervised learning (n ¼ 18) [e.g. seemed to be the method of choice, followed by the SVM. From 2018,
clustering, association rule mining], semi-supervised learning (n ¼ 4) [e.g. the widespread popularity of deep learning appears to have made its
graph learning, transductive support vector machine (t-SVM)], text way into public health research with Twitter data, as it is becoming the
analysis and natural language processing (n ¼ 23) [e.g. latent Dirichlet dominant method used since then.
allocation (LDA), biterm topic modelling, lexicon analysis], deep learning
(n ¼ 16) [e.g. Recurrent Neural Networks (RNNs), Convolutional Neural
3
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
Table 2
Summary of statistical and machine learning methods and data sources for surveillance using Twitter data.
Public Health Issue Method Comparative Data Source
3.2. Application domains of Twitter in public health bubble trend chart from the reviewed papers. This chart, included in
Fig. 6, illustrates the research activity in each domain for each year with
Through the synthesis of the data obtained from the reviewed arti the size of the bubble representing the number of articles for a given year
cles, we broadly identified 6 different ways in which Twitter data is used and public health domain. It shows that there appears to indeed be a
for public health research. The identified domains were: (i) surveillance trend in activity for different public health domains. In 2011, there is
(n ¼ 41); (ii) event detection (n ¼ 38); (iii) pharmacovigilance (n ¼ 19); little to moderate activity across the board. In the years following that,
(iv) forecasting (n ¼ 15); (v) disease tracking (n ¼ 12) and (vi) geographic we see research in some domains drop off and on the map, and some
identification (n ¼ 7). Note that these domains where not always mutu growing steadily in size. Event detection, surveillance and pharmaco
ally exclusive. Surveillance includes articles aiming to monitor some vigilance appear to have seen steady increases in activity, leading the
status over a period of time. Event detection includes articles that aim to other domains. However, since 2016, research in those three domains
discover and/or identify a health-related event from Twitter data. has reduced slightly, with some focus switching to the other domains.
Pharmocovigilance includes articles which were concerned with public The data for the year 2019 is not particularly informative, as the scoping
drug consumption and reactions to said drugs. Forecasting includes ar review was only carried out in the first quarter of 2019.
ticles which aim to predict the trends for health-related events. Disease We were also interested in the different techniques applied across
tracking includes articles attempting to observe or predict the spread of different public health research domains. We computed a matrix of the
diseases in the public through Twitter. Geographic identification includes application domains against the techniques applied and visualised it as a
articles whose aim is to geolocate Twitter users, usually in order to heatmap. This heatmap is shown in Fig. 7. Darker colours in the heatmap
facilitate or improve the application of one of the other domains. indicate higher activity for that cell. Supervised learning appears to see a
We were interested in examining the trends, if any, in the public lot of utility across the board. Deep learning and natural language pro
health application domains studied over the years. We constructed a cessing also see a fair amount of utility, particularly in event detection,
4
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
Table 3
Summary of statistical and machine learning methods and data sources for event detection using Twitter data.
Public Health Issue Method Complementary Data
Table 4
Summary of statistical and machine learning methods and data sources for pharmacovigilance using Twitter data.
Public Health Issue Method Complementary Data
pharmacovigilance and surveillance. Unsupervised learning seems to focused on employing machine learning in order to utilize Twitter as an
see some utility use in surveillance and event detection. On the other alternative or augmentative resource to traditional health surveillance
hand, semi-supervised learning appears to see the least use across the systems. Naturally, the surveillance domain encompasses the field of
board. syndromic surveillance [13–15]. However, it is broad and also includes
The reviewed articles were found to exist within one or more of these additional applications such as the tracking of vaccination efforts [16]
domains. These domains are discussed in more detail below. and monitoring of environmental conditions [17,18], as well as for
natural disaster reporting and alarming [19]. That being said, the most
3.2.1. Surveillance common application was the syndromic surveillance of influenza-like
Surveillance was the most popular research domain with around illnesses (ILIs). Besides ILIs, other diseases and conditions that were
43% of the reviewed articles represented. Research on surveillance studied include dengue, HIV, gastroenteritis, ebola, diarrhoea and
5
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
Table 5
Summary of statistical and machine learning methods and data sources for forecasting using Twitter data.
Public Health Issue Method Complementary Data
Table 6
Summary of statistical and machine learning methods and data sources for disease tracking using Twitter data.
Public Health Issue Method Complementary Data
Table 7
Summary of statistical and machine learning methods and data sources for geographic identification using Twitter data.
Public Health Method Complementary Data
Issue
allergies. Due to the extensive research carried out in this area, a wide
range of techniques were used. For example, supervised learning applied
in the form of k-Nearest Neighbours (kNN) was used to monitor allergy
trends and occurences [20]. Unsupervised learning was used in the form
of Density-based Spatial Clustering of Applications with Noise
(DBSCAN) clustering in order to exploit the spatial and temporal prop
erties of the Twitter stream for dengue surveillance [21].
Semi-supervised learning was used in the form of transductive SVMs for
the surveillance of ILIs, gastroenteritis, diarrhoea and vomiting [22].
6
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
7
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
Fig. 6. Bubble chart showing the trends of research activity in public health application domains with time. The size of the bubble represents the number of articles
in each category and year.
population [55], depression and suicide [26], ebola [56] and most Additionally, stepwise regression was used to detect depression from
common of all, ILI [57]. Such research tends to be fairly recent with the Tweets in order to explore the effect of climate and seasonality on mood
mode publication year being 2016. The statistical and machine learning [60].
techniques used were typically supervised, with most studies employing
either classification or regression to make the predictions necessary for 3.2.3. Pharmacovigilance
detection. For example, SVMs were used to detect mention of Research in pharmocovigilance focused mainly on adverse drug re
‘‘dabbing’‘, a method of marijuana consumption that involves inhaling actions and events, but also investigated with recreational drug use and
vapors from heating marijuana concentrates [58]. CNNs were used to abuse. Usually, when studying the use of Twitter to detect adverse drug
detect harmful algal blooms from pictures posted on Twitter [59]. reactions and events, articles searched for a range of names obtained
8
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
from a thesaurus of drugs and events, such as the Medline Plus Drug 3.2.5. Disease tracking
Information [83]. However, other such studies focused on a drug for a Disease tracking is a domain that seeks to support epidemiology by
particular disease such as HIV [63]. In addition, studies also investigated offering insight into the spread of infectious diseases. Research in this
drug habits and their effects on the population. For example, one article domain is primarily interested in understanding the way in which dis
studied the use of e-cigarettes and their utility for smoking cessation eases spread through a population. It looks toward not only gaining a
[62]. Another article studied the variability of alcoholism with time better understanding of the spread of diseases, but also to keep track of
[84]. A number of the pharmacovigilance studies utilized sentiment the public health state during recognized outbreaks and mass gatherings
analysis, usually a form of supervised text classification, to aid in their which could be a breeding ground for disease. For example, one study
efforts [28,63,83]. In fact, most of the studies make use of supervised investigated and proposed a means of traking flu transmission in China
learning in the form of text classification using mostly SVMs and deci using Twitter [39]. Another study retrospectively tracked the spread of
sion trees. Of the 19 articles in this domain, three made use of deep measles during the 2015 outbreak [101]. Additionally, there was a study
learning [28,85,86], one employed a semi-supervised multi-instance to detect the occurrence and spread of disease symptoms which could
learning approach [86] and three used unsupervised natural language signify a potential outbreak at a number of British music festivals and a
processing [28,66,87]. religious event in Mecca, Saudi Arabia [50]. Most studies in this domain
made use of machine learning methods, leaning towards supervised
3.2.4. Forecasting learning. In particular, regression learning proved popular, as two studies
Forecasting research studies the prediction of public health trends, as utilized dynamic regression and support vector regression to track the
well as means of nowcasting which is the prediction of the present state of spread of influenza [96,100]. Another study proposed a gaussian mixture
public health. It can be seen as a part of the syndromic surveillance regression approach to estimating the geographic origin of a tweet for use
effort, aimed at predicting epidemics in order to improve crisis response. during an outbreak [102]. There were also some studies which used
Research in this domain is focused predominantly on ILIs. Around 67% statistical analysis to obtain impressive results. One of such studies made
of the reviewed literature studied ILI. However, other diseases such as use of the TSIR (time-series Susceptible-Infected-Recovered) model to
dengue, gastroenteritis, cancer and asthma were also studied [22,23,53, understand human mobility and the spread of the dengue virus in Lahore,
95]. While a mix of statistics and machine learning is used in this Pakistan [103]. While it was rare, one study made use of semi-supervised
domain, there is a heavier focus on statistics. In fact most studies made learning and deep learning to simulate influenza epidemics.
use of statistical techniques like regression and time series analysis. For
example, dynamic regression was used to predict infuenza trends in 3.2.6. Geographic identification
Boston, USA [96]. AutoRegressive Integrated Moving Average (ARIMA) Geographic identification is a small domain which involves the
was used to forecast influenza cases on a city level in Chongqing, China, extraction of geographical information from Twitter data and typically
as well as for predicting gatroenteritis in the UK [22,97]. Partial dif sees little use alone. Rather, it is used in conjunction with other domains
ferential equations were used to forecast influenza cases on a regional to improve the efficacy of solutions or provide added benefit. It is most
level across the USA [44]. Deep learning was also used to aid in the often used with surveillance and disease tracking. Methods used in geo
forecasting problem of predicting influenza cases [40] and in the crea grahic identification are typically based on unsupervised learning. For
tion of SENTINEL, a software system capable of nowcasting diseases example, DBSCAN clustering was used to monitor and track obesity
being monitored by the US Centre for Disease Control (CDC) [98]. Un levels within the population [54], as well as track the spread of the
supervised learning was used in the form of topic modelling in a study dengue virus [21]. Another study utilized hot spot analysis to examine
aiming to predict health transition trends without any a priori diseases spatial patterns of depression on Twitter. Some supervised learning,
[51]. typically in the form of classification is also used in geographic identi
fication. Here, a classifier is used to predict the location of a tweet based
9
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
on some features of the tweet, usually its word collocations. As an application of supervised learning techniques. This is somewhat un
example, one study in the review made use of a random forest classifier derstandable as the most popular application domains were surveillance
to predict which city and province a tweet determined to be from Can and detection, which are related to the supervised learning tasks of
ada (according to the Twitter API), was from Ref. [105]. While classification and prediction. The average number of Tweets used in the
geographic identification in itself is not of major use to the field of public reviewed studies was roughly twenty thousand. This suggests that most
health, when combined with other identified public health research of the reviewed articles had large amounts of labelled Twitter data
domains, it offers improvements on the specificity and granularity of available to them which leads to supervised learning tasks. Unfortu
their results. nately, such labeling could constitute a sizeable effort so we have
identified the use of unsupervised learning, and particularly semi-
4. Discussion supervised learning, as another potential area for new exploration.
Such approaches would reduce the amount of labelled Twitter data
This review has compiled and analysed the published literature on required by also taking advantage of the unlabeled data. Some articles
the use of Twitter data for public health, highlighting popular and cur are already starting to emerge [91,104] but mostly only focused on ILI so
rent research and applications. In terms of research undertaken so far, far.
three findings were produced from the review. First, we identified the Furthermore, in terms of application areas despite the rich potential
key application domains being studied: (i) surveillance; (ii) event detec for success from using Twitter data for public health which was identi
tion; (iii) pharmacovigilance; (iv) forecasting; (v) disease tracking and (vi) fied in the literature, there were few articles describing active Twitter-
geographic identification. Studies were found to predominantly be con based systems and/or their evaluation in an operational context for
cerned with surveillance, event detection and pharmacovigilance. Next, routine public health practice. This may suggest that it is somewhat
the conditions and diseases being tackled using Twitter data were difficult to translate research using Twitter for public health into prac
identified. We discovered a wide range of illnesses to which Twitter data tice. We believe the bulk of this challenge might come from the ethical
is being applied to including infectious diseases, mental health prob issues involved and the lack of an ethical framework for the integration
lems, environmental issues and social issues. Finally, we mapped out the of social media into surveillance systems. Hence the development of
statistical and machine learning algorithms and approaches being used robust ethical frameworks could be an important area for future work.
to process and analyse Twitter data for public health purposes. In doing That being said, public health institutions around the world may already
so, we observed trends in these approaches. Bayesian learning and SVMs be using Twitter as such a tool, and just not reporting their efforts.
appear to be popular algorithms of choice, however, in the past two It is also important to note that this review had some limitations.
years the focus seems to have shifted towards deep learning. Constraints in the search methodology such as the use of broad search
So far our findings will enable researchers working in health data to terms and the exclusion of works-in-progress may have resulted in some
identify relevant studies in different application areas, tackling different relevant studies being missed. However, this is a common limitation of
diseases or conditions and will also provide evidence of analysis tech scoping reviews as they are intended to broadly map topics, achieving a
niques that have been applied in each context. This will enable faster good balance of breadth and depth in a relatively quick time-frame
development of new applications, which is an important contribution of [107].
our research with the growth on the user of Twitter around the world,
and particularly in Low and Middle Income Countries (LMIC). The use of 5. Conclusion
Twitter in a health context can present new practical and affordable
solutions for implementing disease monitoring and surveillance in This review makes an important contribution by successfully giving
countries with weak health systems. an overview of the use of Twitter data in the context of monitoring,
While research toward using Twitter for public health has been detection and forecasting of public health conditions. We providing
extensive, our study has also identified some gaps for future researchers insightful analysis of the existing literature in the field, including the
to fill. The identification of gaps is an important deliverable of a scoping type of conditions being monitored; the data analysis techniques being
review and hence a contribution of our work. used and the application areas most commonly found. We also analysed
In terms of diseases tackled so far, understandably, studies are time trends to understand how research in this area is evolving over
focused on infectious diseases because of their global importance. In time. Such information will be useful in aiding researchers, clinicians
particular, the reviewed research focused heavily on the surveillance and policy makers in understanding the modern landscape of public
and detection of influenza. However, we have identified significant health applications for social media.
scope to explore the use of Twitter data in other infectious diseases. To conclude, research into the application of Twitter data for public
Some such studies are beginning to take place (e.g. dengue or ebola) but health has uncovered interesting and inspiring advances, especially in
much more work is expected in the light of recent outbreaks. Often recent years, and identified gaps in the knowledge thus allowing tar
outbreaks are fast moving situations and research needs to progress very geted research in the future. Overall, we see that Twitter data has been
quickly so our findings will facilitate such endevours. Whereas we may used to aid in pubic health efforts concerned with surveillance, event
not expect Twitter data to be of use for the study of sexually transmitted detection, pharmacovigilance, forecasting, disease tracking and
diseases (STDs) as such a study would rely on Twitter user-reporting geographic identification, demonstrating positive results. We have un
what may be quite sensitive information, other infectious diseases covered the need to evaluate the use of Twitter in less studied epide
such as cholera could be studied. Furthermore, we have also identified miological diseases and other non-epidemiological conditions. We also
the potential utility of Twitter and social media for public health in the uncovered scope to apply semi-supervised algorithms to the task in hand
context of non-infectious diseases, such as asthma or celiac disease as to reduce labelling efforts. Furthermore, we have identified the need for
little work has so far been reported in the literature, yet those diseases a robust framework including ethics to translate research into an oper
can represent a large health burden. An additional area of application ational context and produce working systems.
may be the occurrence of positive health states/outcomes. Our review With the richness of Twitter as a data source, is semi-real time na
did not identify any articles that used Twitter for this, although it might ture, the take up of mobile devices in LMIC that give access to such
be a result of the limitations of our scoping methodology. platforms and with the development of machine learning tools and their
In terms of analysis techniques employed so far, there was wide increasing accessibility, we expect to see more interesting ideas and
10
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
applications of Twitter to public health. Conference on - WebSci, vol. 11, ACM Press, 2011, https://doi.org/10.1145/
2527031.2527049 doi:10.1145/2527031.2527049.
[22] N. Thapen, D. Simmie, C. Hankin, J. Gillard, Defender, Detecting and forecasting
epidemics using novel data-analytics for enhanced response, PloS One 11 (5)
Declaration of competing interest (2016), https://doi.org/10.1371/journal.pone.0155417 e0155417. doi:10.1371/
journal.pone.0155417.
[23] K. Lee, A. Agrawal, A. Choudhary, Real-time disease surveillance using twitter
None Declared. data, in: Proceedings of the 19th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining - KDD, vol. 13, ACM Press, 2013, https://
doi.org/10.1145/2487575.2487709 doi:10.1145/2487575.2487709.
Acknowledgements
[24] M. Krieck, L. Otrusina, P. Smrz, P. Dolog, W. Nejdl, E. Velasco, K. Denecke, How
to exploit twitter for public health monitoring? Methods Inf. Med. 52 (4) (2013)
We acknowledge support from Grant Number ES/L011859/1, from 326–339, https://doi.org/10.3414/me12-02-0010, doi:10.3414/me12-02-0010.
The Business and Local Government Data Research Centre, funded by [25] Y. Khan, G.J. Leung, P. Belanger, E. Gournis, D.L. Buckeridge, L. Liu, Y. Li, I.
L. Johnson, Comparing twitter data to routine data sources in public health
the Economic and Social Research Council to provide economic, scien surveillance for the 2015 pan/parapan american games: an ecological study, Can.
tific and social researchers and business analysts with secure data J. Public Health 109 (3) (2018) 419–426, https://doi.org/10.17269/s41997-018-
services. 0059-0, doi:10.17269/s41997-018-0059-0.
[26] C. McClellan, M.M. Ali, R. Mutter, L. Kroutil, J. Landwehr, Using social media to
monitor mental health discussions - evidence from twitter, J. Am. Med. Inf. Assoc.
References (2016), https://doi.org/10.1093/jamia/ocw133 ocw133. doi:10.1093/jamia/
ocw133.
[27] N. Thangarajan, N. Green, A. Gupta, S. Little, N. Weibel, Analyzing social media
[1] World Health Organisation Who, The world health report 2007 - a safer future:
to characterize local HIV at-risk populations, in: Proceedings of the Conference on
global public health security in the 21st century. http://www.who.int/whr
Wireless Health - WH, vol. 15, ACM Press, 2015, https://doi.org/10.1145/
/2007/en/, 2007.
2811780.2811923 doi:10.1145/2811780.2811923.
[2] S. Triple, Assessment of syndromic surveillance in europe, Lancet 378 (9806)
[28] P. Breen, J. Kelly, T. Heckman, S. Quinn, Mining pre-exposure prophylaxis trends
(2011) 1833.
in social media, in: IEEE International Conference on Data Science and Advanced
[3] C.-E. Winslow, The untilled fields of public health, Science (1920) 23–33.
Analytics (DSAA), IEEE, 2016, https://doi.org/10.1109/dsaa.2016.29, 2016. doi:
[4] B.L. Neiger, R. Thackeray, S.H. Burton, C.G. Giraud-Carrier, M.C. Fagen,
10.1109/dsaa.2016.29.
Evaluating social media’s capacity to develop engaged audiences in health
[29] S.D. Young, N. Mercer, R.E. Weiss, E.A. Torrone, S.O. Aral, Using social media as
promotion settings: use of twitter metrics as a case study, Health Promot. Pract.
a tool to predict syphilis, Prev. Med. 109 (2018) 58–61, https://doi.org/10.1016/
14 (2) (2013) 157–162.
j.ypmed.2017.12.016/, doi:10.1016/j.ypmed.2017.12.016.
[5] M. Wargon, B. Guidet, T. Hoang, G. Hejblum, A systematic review of models for
[30] B. Ofoghi, M. Mann, K. Verspoor, Towards early discovery of salient health
forecasting the number of emergency department visits, Emerg. Med. J. 26 (6)
threats: a social media emotion classification technique, in: Biocomputing 2016:
(2009) 395–399.
Proceedings of the Pacific Symposium, World Scientific, 2016, pp. 504–515.
[6] L.E. Charles-Smith, T.L. Reynolds, M.A. Cameron, M. Conway, E.H. Lau, J.
[31] E. Diaz-Aviles, A. Stewart, Tracking twitter for epidemic intelligence, in:
M. Olsen, J.A. Pavlin, M. Shigematsu, L.C. Streichert, K.J. Suda, et al., Using
Proceedings of the 3rd Annual ACM Web Science Conference on - WebSci ’12,
social media for actionable disease surveillance and outbreak management: a
ACM Press, 2012, https://doi.org/10.1145/2380718.2380730 doi:10.1145/
systematic literature review, PloS One 10 (10) (2015) e0139701.
2380718.2380730.
[7] L. Sinnenberg, A.M. Buttenheim, K. Padrez, C. Mancheno, L. Ungar, R.
[32] A. Sadilek, H. Kautz, L. DiPrete, B. Labus, E. Portman, J. Teitel, V. Silenzio,
M. Merchant, Twitter as a tool for health research: a systematic review, Am. J.
Deploying nemesis: preventing foodborne illness by data mining social media, AI
Publ. Health 107 (1) (2017) e1–e8.
Mag. 38 (1) (2017) 37–48.
[8] H. Arksey, L. O’Malley, Scoping studies: towards a methodological framework,
[33] S. Liu, M. Zhu, D.J. Yu, A. Rasin, S.D. Young, Using real-time social media
Int. J. Soc. Res. Methodol. 8 (1) (2005) 19–32.
technologies to monitor levels of perceived stress and emotional state in college
[9] J. Ginsberg, M.H. Mohebbi, R.S. Patel, L. Brammer, M.S. Smolinski, L. Brilliant,
students: a web-based questionnaire study, JMIR Mental Health 4 (1) (2017),
Detecting influenza epidemics using search engine query data, Nature 457 (7232)
https://doi.org/10.2196/mental.5626 e2. doi:10.2196/mental.5626.
(2009) 1012.
[34] M. Riga, K. Karatzas, Investigating the relationship between social media content
[10] D. Moher, A. Liberati, J. Tetzlaff, D.G. Altman, P. Group, et al., Preferred
and real-time observations for urban air quality and public health, in: Proceedings
Reporting Items for Systematic Reviews and Meta-Analyses: the Prisma
of the 4th International Conference on Web Intelligence, Mining and Semantics
Statement, 2010.
(WIMS14) - WIMS ’14, ACM Press, 2014, https://doi.org/10.1145/
[11] A.B. Shatte, D.M. Hutchinson, S.J. Teague, Machine learning in mental health: a
2611040.2611093 doi:10.1145/2611040.2611093.
scoping review of methods and applications, Psychol. Med. (2019) 1–23.
[35] G. Lin, R.N. Zaeem, H. Sun, K.S. Barber, Trust filter for disease surveillance:
[12] T. Joachims, Text categorization with support vector machines: learning with
Identity, in: Intelligent Systems Conference (IntelliSys), IEEE, 2017, https://doi.
many relevant features, in: European Conference on Machine Learning, Springer,
org/10.1109/intellisys.2017.8324259, 2017. doi:10.1109/
1998, pp. 137–142.
intellisys.2017.8324259.
[13] C. Hankin, O. Serban, N. Thapen, B. Maginnis, V. Foot, Real-time processing of
[36] O. Şerban, N. Thapen, B. Maginnis, C. Hankin, V. Foot, Real-time processing of
social media with sentinel: a syndromic surveillance system incorporating deep
social media with SENTINEL: a syndromic surveillance system incorporating deep
learning for health classification.
learning for health classification, Inf. Process. Manag. 56 (3) (2019) 1166–1184,
[14] L. Chen, K.S.M.T. Hossain, P. Butler, N. Ramakrishnan, B.A. Prakash, Syndromic
https://doi.org/10.1016/j.ipm.2018.04.011, doi:10.1016/j.ipm.2018.04.011.
surveillance of flu on twitter using weakly supervised temporal topic models,
[37] J. Parker, A. Yates, N. Goharian, O. Frieder, Health-related hypothesis generation
Data Min. Knowl. Discov. 30 (3) (2015) 681–710, https://doi.org/10.1007/
using social media data, Social Network Analysis and Mining 5 (1) (2015),
s10618-015-0434-x, doi:10.1007/s10618-015-0434-x.
https://doi.org/10.1007/s13278-014-0239-8 doi:10.1007/s13278-014-0239-8.
[15] D. Janies, Z. Witter, C. Gibson, T. Kraft, I.F. Senturk, Ü. Çatalyürek, Syndromic
[38] J. D. Sharpe, R. S. Hopkins, R. L. Cook, C. W. Striley, Evaluating google, twitter,
surveillance of infectious diseases meets molecular epidemiology in a workflow
and wikipedia as tools for influenza surveillance using bayesian change point
and phylogeographic application, Stud. Health Technol. Inf. 216 (2015) 766–770.
analysis: a comparative analysis, JMIR public health and surveillance 2 (2).
[16] S. Song, Z.B. Miled, Digital immunization surveillance: monitoring flu vaccination
[39] J. Huang, H. Zhao, J. Zhang, Detecting flu transmission by social sensor in China,
rates using online social networks, in: 2017 IEEE 14th International Conference
in: IEEE International Conference on Green Computing and Communications and
on Mobile Ad Hoc and Sensor Systems (MASS), IEEE, 2017, https://doi.org/
IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, IEEE,
10.1109/mass.2017.96 doi:10.1109/mass.2017.96.
2013, https://doi.org/10.1109/greencom-ithings-cpscom.2013.216, 2013. doi:
[17] Y. Hswen, Q. Qin, J.S. Brownstein, J.B. Hawkins, Feasibility of using social media
10.1109/greencom-ithings-cpscom.2013.216.
to monitor outdoor air pollution in london, england, Prev. Med. 121 (2019)
[40] K. Lee, A. Agrawal, A. Choudhary, Forecasting influenza levels using real-time
86–93, https://doi.org/10.1016/j.ypmed.2019.02.005, doi:10.1016/j.
social media streams, in: IEEE International Conference on Healthcare
ypmed.2019.02.005.
Informatics (ICHI), IEEE, 2017, https://doi.org/10.1109/ichi.2017.68, 2017. doi:
[18] J. Jung, C.K. Uejio, Social media responses to heat waves, Int. J. Biometeorol. 61
10.1109/ichi.2017.68.
(7) (2017) 1247–1260, https://doi.org/10.1007/s00484-016-1302-0, doi:
[41] K. Byrd, A. Mansurov, O. Baysal, Mining twitter data for influenza detection and
10.1007/s00484-016-1302-0.
surveillance, in: Proceedings of the International Workshop on Software
[19] R. Auxilia, M. Gandhi, Earthquake reporting system development by tweet
Engineering in Healthcare Systems, ACM, 2016, pp. 43–49.
analysis with approach earthquake alarm systems, Res. J. Pharmaceut. Biol.
[42] D.A. Broniatowski, M. Dredze, M.J. Paul, A. Dugas, Using social media to perform
Chem. Sci. 7 (3) (2016) 501–506.
local influenza surveillance in an inner-city hospital: a retrospective
[20] K. Nargund, S. Natarajan, Public health allergy surveillance using micro-blogs, in:
observational study, JMIR public health and surveillance 1 (1) (2015).
2016 International Conference on Advances in Computing, Communications and
[43] C. Allen, M.-H. Tsou, A. Aslam, A. Nagel, J.-M. Gawron, Applying GIS and
Informatics (ICACCI), IEEE, 2016, https://doi.org/10.1109/icacci.2016.7732248
machine learning methods to twitter data for multiscale surveillance of influenza,
doi:10.1109/icacci.2016.7732248.
PloS One 11 (7) (2016), https://doi.org/10.1371/journal.pone.0157734
[21] J. Gomide, A. Veloso, W. Meira, V. Almeida, F. Benevenuto, F. Ferraz,
e0157734. doi:10.1371/journal.pone.0157734.
M. Teixeira, Dengue surveillance based on a computational model of spatio-
temporal locality of twitter, in: Proceedings of the 3rd International Web Science
11
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
[44] F. Wang, H. Wang, K. Xu, R. Raymond, J. Chon, S. Fuller, A. Debruyn, Regional [66] T.K. Mackey, J. Kalyanam, T. Katsuki, G. Lanckriet, Twitter-based detection of
level influenza study with geo-tagged twitter data, J. Med. Syst. 40 (8) (2016), illegal online sale of prescription opioid, Am. J. Publ. Health 107 (12) (2017)
https://doi.org/10.1007/s10916-016-0545-y doi:10.1007/s10916-016-0545-y. 1910–1915, https://doi.org/10.2105/ajph.2017.303994, doi:10.2105/
[45] H. Achrekar, A. Gandhe, R. Lazarus, S.-H. Yu, B. Liu, Predicting flu trends using ajph.2017.303994.
twitter data, in: IEEE Conference on Computer Communications Workshops [67] P. M. Massey, A. Leader, E. Yom-Tov, A. Budenz, K. Fisher, A. C. Klassen,
(INFOCOM WKSHPS), IEEE, 2011, https://doi.org/10.1109/ Applying multiple data collection tools to quantify human papillomavirus vaccine
infcomw.2011.5928903, 2011. doi:10.1109/infcomw.2011.5928903. communication on twitter, J. Med. Internet Res. 18 (12).
[46] X. Dai, M. Bikdash, Distance-based outliers method for detecting disease [68] B. Zou, V. Lampos, R. Gorton, I.J. Cox, On infectious intestinal disease
outbreaks using social media, in: SoutheastCon 2016, IEEE, 2016, https://doi. surveillance using social media content, in: Proceedings of the 6th International
org/10.1109/secon.2016.7506752 doi:10.1109/secon.2016.7506752. Conference on Digital Health Conference - DH, vol. 16, ACM Press, 2016, https://
[47] L. Chen, K.T. Hossain, P. Butler, N. Ramakrishnan, B.A. Prakash, Flu gone viral: doi.org/10.1145/2896338.2896372 doi:10.1145/2896338.2896372.
syndromic surveillance of flu on twitter using temporal topic models, in: IEEE [69] J. Wang, L. Zhao, Y. Ye, Y. Zhang, Adverse event detection by integrating twitter
International Conference on Data Mining, IEEE, 2014, https://doi.org/10.1109/ data and VAERS, J. Biomed. Semant. 9 (1) (2018), https://doi.org/10.1186/
icdm.2014.137, 2014. doi:10.1109/icdm.2014.137. s13326-018-0184-y doi:10.1186/s13326-018-0184-y.
[48] J. Parker, Y. Wei, A. Yates, O. Frieder, N. Goharian, A framework for detecting [70] W. Yang, L. Mu, GIS analysis of depression among twitter users, Appl. Geogr. 60
public health trends with twitter, in: Proceedings of the 2013 IEEE/ACM (2015) 217–223, https://doi.org/10.1016/j.apgeog.2014.10.016, doi:10.1016/j.
International Conference on Advances in Social Networks Analysis and Mining - apgeog.2014.10.016.
ASONAM, vol. 13, ACM Press, 2013, https://doi.org/10.1145/2492517.2492544 [71] P. Nambisan, Z. Luo, A. Kapoor, T.B. Patrick, R.A. Cisler, Social media, big data,
doi:10.1145/2492517.2492544. and public health informatics: ruminating behavior of depression revealed
[49] A. Culotta, Estimating county health statistics with twitter, in: Proceedings of the through twitter, in: 2015 48th Hawaii International Conference on System
32nd Annual ACM Conference on Human Factors in Computing Systems - CHI, Sciences, IEEE, 2015, https://doi.org/10.1109/hicss.2015.351 doi:10.1109/
vol. 14, ACM Press, 2014, https://doi.org/10.1145/2556288.2557139 doi: hicss.2015.351.
10.1145/2556288.2557139. [72] H. Lee, J.H. McAuley, M. Hübscher, H.G. Allen, S.J. Kamper, G.L. Moseley,
[50] E. Yom-Tov, D. Borsa, I.J. Cox, R.A. McKendry, Detecting disease outbreaks in Tweeting back: predicting new cases of back pain with mass social media data,
mass gatherings using internet data, J. Med. Internet Res. 16 (6) (2014), https:// J. Am. Med. Inf. Assoc. 23 (3) (2015) 644–648, https://doi.org/10.1093/jamia/
doi.org/10.2196/jmir.3156 e154. doi:10.2196/jmir.3156. ocv168, doi:10.1093/jamia/ocv168.
[51] S. Sidana, S. Amer-Yahia, M. Clausel, M. Rebai, S.T. Mai, M.-R. Amini, Health [73] J.K. Harris, R. Mansour, B. Choucair, J. Olson, C. Nissen, J. Bhatt, Health
monitoring on social media over time, IEEE Trans. Knowl. Data Eng. 30 (8) Department Use of Social Media to Identify Foodborne Illness––chicago, illinois,
(2018) 1467–1480, https://doi.org/10.1109/tkde.2018.2795606, doi:10.1109/ 2013–2014, MMWR. Morbidity and mortality weekly report, vol. 63, 2014,
tkde.2018.2795606. p. 681, 32.
[52] E.D. Livelo, C. Cheng, Intelligent dengue infoveillance using gated recurrent [74] N. Heaivilin, B. Gerbert, J. Page, J. Gibbs, Public health surveillance of dental
neural learning and cross-label frequencies, in: IEEE International Conference on pain via twitter, J. Dent. Res. 90 (9) (2011) 1047–1051, https://doi.org/10.1177/
Agents (ICA), IEEE, 2018, https://doi.org/10.1109/agents.2018.8459963, 2018. 0022034511415273, doi:10.1177/0022034511415273.
doi:10.1109/agents.2018.8459963. [75] X. Dai, M. Bikdash, B. Meyer, From social media to public health surveillance:
[53] C. de Almeida Marques-Toledo, C.M. Degener, L. Vinhal, G. Coelho, W. Meira, C. word embedding based clustering method for twitter classification, in:
T. Codeço, M.M. Teixeira, Dengue prediction by the web: tweets are a useful tool SoutheastCon, IEEE, 2017, https://doi.org/10.1109/secon.2017.7925400, 2017.
for estimating and forecasting dengue at country and city level, PLoS Neglected doi:10.1109/secon.2017.7925400.
Trop. Dis. 11 (7) (2017), https://doi.org/10.1371/journal.pntd.0005729 [76] S. Lim, C.S. Tucker, S. Kumara, An unsupervised machine learning model for
e0005729. doi:10.1371/journal.pntd.0005729. discovering latent infectious diseases using social media data, J. Biomed. Inf. 66
[54] D. Khanaferov, C. Luc, T. Wang, Social network data mining using natural (2017) 82–94, https://doi.org/10.1016/j.jbi.2016.12.007, doi:10.1016/j.
language processing and density based clustering, in: IEEE International jbi.2016.12.007.
Conference on Semantic Computing, IEEE, 2014, https://doi.org/10.1109/ [77] D.A. Broniatowski, M.J. Paul, M. Dredze, National and local influenza
icsc.2014.48, 2014. doi:10.1109/icsc.2014.48. surveillance through twitter: an analysis of the 2012-2013 influenza epidemic,
[55] T.K. Mackey, J. Kalyanam, Detection of illicit online sales of fentanyls via twitter, PloS One 8 (12) (2013), https://doi.org/10.1371/journal.pone.0083672 e83672.
F1000Research 6 (2017) 1937, https://doi.org/10.12688/ doi:10.1371/journal.pone.0083672.
f1000research.12914.1, doi:10.12688/f1000research.12914.1. [78] M. Wagner, V. Lampos, I.J. Cox, R. Pebody, The added value of online user-
[56] K. Rudra, A. Sharma, N. Ganguly, M. Imran, Classifying information from generated content in traditional methods for influenza surveillance, Sci. Rep. 8
microblogs during epidemics, in: Proceedings of the 2017 International (1) (2018), https://doi.org/10.1038/s41598-018-32029-6 doi:10.1038/s41598-
Conference on Digital Health - DH, vol. 17, ACM Press, 2017, https://doi.org/ 018-32029-6.
10.1145/3079452.3079491 doi:10.1145/3079452.3079491. [79] S. Wakamiya, Y. Kawai, E. Aramaki, Twitter-based influenza detection after flu
[57] X. Dai, M. Bikdash, Hybrid classification for tweets related to infection with peak via tweets with indirect information: text mining study, JMIR Public Health
influenza, in: SoutheastCon 2015, IEEE, 2015, https://doi.org/10.1109/ and Surveillance 4 (3) (2018), https://doi.org/10.2196/publichealth.8627 e65.
secon.2015.7133015 doi:10.1109/secon.2015.7133015. doi:10.2196/publichealth.8627.
[58] A.A. Ginart, S. Das, J.K. Harris, R. Wong, H. Yan, M. Krauss, P.A. Cavazos-Rehg, [80] H. Woo, H.S. Cho, E. Shim, J.K. Lee, K. Lee, G. Song, Y. Cho, Identification of
Drugs or dancing? using real-time machine learning to classify streamed keywords from twitter and web blog posts to detect influenza epidemics in korea,
“dabbing” homograph tweets, in: 2016 IEEE International Conference on Disaster Med. Public Health Prep. 12 (3) (2017) 352–359, https://doi.org/
Healthcare Informatics (ICHI), IEEE, 2016, https://doi.org/10.1109/ichi.2016.97 10.1017/dmp.2017.84, doi:10.1017/dmp.2017.84.
doi:10.1109/ichi.2016.97. [81] H. Hu, H. Wang, F. Wang, D. Langley, A. Avram, M. Liu, Prediction of influenza-
[59] A.C. Kumar, S.M. Bhandarkar, A deep learning paradigm for detection of harmful like illness based on the improved artificial tree algorithm and artificial neural
algal blooms, in: IEEE Winter Conference on Applications of Computer Vision network, Sci. Rep. 8 (1) (2018), https://doi.org/10.1038/s41598-018-23075-1
(WACV), IEEE, 2017, https://doi.org/10.1109/wacv.2017.88, 2017. doi: doi:10.1038/s41598-018-23075-1.
10.1109/wacv.2017.88. [82] E.E. Küçük, K. Yapar, D. Küçük, D. Küçük, Ontology-based automatic
[60] W. Yang, L. Mu, Y. Shen, Effect of climate and seasonality on depressed mood identification of public health-related Turkish tweets, Comput. Biol. Med. 83
among twitter users, Appl. Geogr. 63 (2015) 184–191, https://doi.org/10.1016/j. (2017) 1–9, https://doi.org/10.1016/j.compbiomed.2017.02.001, doi:10.1016/j.
apgeog.2015.06.017, doi:10.1016/j.apgeog.2015.06.017. compbiomed.2017.02.001.
[61] A. Esperanca, Z.B. Miled, M. Mahoui, Social media sensing framework for [83] Y. Peng, M. Moh, T.-S. Moh, Efficient adverse drug event extraction using twitter
population health, in: 2019 IEEE 9th Annual Computing and Communication sentiment analysis, in: 2016 IEEE/ACM International Conference on Advances in
Workshop and Conference (CCWC), IEEE, 2019, https://doi.org/10.1109/ Social Networks Analysis and Mining (ASONAM), IEEE, 2016, https://doi.org/
ccwc.2019.8666534 doi:10.1109/ccwc.2019.8666534. 10.1109/asonam.2016.7752365 doi:10.1109/asonam.2016.7752365.
[62] Y. Aphinyanaphongs, A. Lulejian, D.P. Brown, R. Bonneau, P. Krebs, Text [84] J.H. West, P.C. Hall, C.L. Hanson, K. Prier, C. Giraud-Carrier, E.S. Neeley, M.
classification for automatic detection of e-cigarette use and use for smoking D. Barnes, Temporal variability of problem drinking on twitter, Open J. Prev.
cessation from twitter: a feasibility pilot, in: Biocomputing 2016: Proceedings of Med. 2 (2012) 43, 01.
the Pacific Symposium, World Scientific, 2016, pp. 480–491. [85] W.-S. Lin, H.-J. Dai, J. Jonnagaddala, N.-W. Chang, T.R. Jue, U. Iqbal, J.Y.-
[63] C. Adrover, T. Bodnar, Z. Huang, A. Telenti, M. Salath� e, Identifying adverse H. Shao, I.-J. Chiang, Y.-C. Li, Utilizing different word representation methods for
effects of HIV drug treatment and associated sentiments using twitter, JMIR twitter data in adverse drug reactions extraction, in: 2015 Conference on
Public Health and Surveillance 1 (2) (2015), https://doi.org/10.2196/ Technologies and Applications of Artificial Intelligence (TAAI), IEEE, 2015,
publichealth.4488 doi:10.2196/publichealth.4488 e7. https://doi.org/10.1109/taai.2015.7407070 doi:10.1109/taai.2015.7407070.
[64] K. Lee, A. Agrawal, A. Choudhary, Mining social media streams to improve public [86] S. Gupta, S. Pawar, N. Ramrakhiyani, G.K. Palshikar, V. Varma, Semi-supervised
health allergy surveillance, in: Proceedings of the 2015 IEEE/ACM International recurrent neural network for adverse drug reaction mention extraction, BMC
Conference on Advances in Social Networks Analysis and Mining 2015 - Bioinf. 19 (S8) (2018), https://doi.org/10.1186/s12859-018-2192-4 doi:
ASONAM, vol. 15, ACM Press, 2015, https://doi.org/10.1145/2808797.2808896 10.1186/s12859-018-2192-4.
doi:10.1145/2808797.2808896. [87] G.J. Kang, S.R. Ewing-Nelson, L. Mackey, J.T. Schlitt, A. Marathe, K.M. Abbas,
[65] N. Phan, S.A. Chun, M. Bhole, J. Geller, Enabling real-time drug abuse detection S. Swarup, Semantic network analysis of vaccine sentiment in online social
in tweets, in: IEEE 33rd International Conference on Data Engineering (ICDE), media, Vaccine 35 (29) (2017) 3621–3638, https://doi.org/10.1016/j.
IEEE, 2017, https://doi.org/10.1109/icde.2017.221, 2017. doi:10.1109/ vaccine.2017.05.052, doi:10.1016/j.vaccine.2017.05.052.
icde.2017.221.
12
O. Edo-Osagie et al. Computers in Biology and Medicine 122 (2020) 103770
[88] M. Chary, N. Genes, C. Giraud-Carrier, C. Hanson, L.S. Nelson, A.F. Manini, [98] O. Serban, N. Thapen, B. Maginnis, C. Hankin, V. Foot, Real-time processing of
Epidemiology from tweets: estimating misuse of prescription opioids in the USA social media with sentinel: a syndromic surveillance system incorporating deep
from social media, J. Med. Toxicol. 13 (4) (2017) 278–286, https://doi.org/ learning for health classification, Inf. Process. Manag. 56 (3) (2019) 1166–1184.
10.1007/s13181-017-0625-5, doi:10.1007/s13181-017-0625-5. [99] P.A. Valli, M. Uma, T. Sasikala, Tracing out various diseases by analyzing twitter
[89] I. Korkontzelos, A. Nikfarjam, M. Shardlow, A. Sarker, S. Ananiadou, G. data applying data mining techniques, in: International Conference on Energy,
H. Gonzalez, Analysis of the effect of sentiment analysis on extracting adverse Communication, Data Analytics and Soft Computing, ICECDS, IEEE, 2017,
drug reactions from tweets and forum posts, J. Biomed. Inf. 62 (2016) 148–158, https://doi.org/10.1109/icecds.2017.8389714, 2017. doi:10.1109/
https://doi.org/10.1016/j.jbi.2016.06.007, doi:10.1016/j.jbi.2016.06.007. icecds.2017.8389714.
[90] K. O’Connor, P. Pimpalkhute, A. Nikfarjam, R. Ginn, K.L. Smith, G. Gonzalez, [100] A. Signorini, A.M. Segre, P.M. Polgreen, The use of twitter to track levels of
Pharmacovigilance on twitter? mining tweets for adverse drug reactions, in: disease activity and public concern in the u.s. during the influenza a h1n1
AMIA Annual Symposium Proceedings, vol. 2014, American Medical Informatics pandemic, PloS One 6 (5) (2011), https://doi.org/10.1371/journal.
Association, 2014, p. 924. pone.0019467 e19467. doi:10.1371/journal.pone.0019467.
[91] J. Wang, L. Zhao, Y. Ye, Semi-supervised multi-instance interpretable models for [101] L. Tang, B. Bie, D. Zhi, Tweeting about measles during stages of an outbreak: a
flu shot adverse event detection, in: 2018 IEEE International Conference on Big semantic network approach to the framing of an emerging infectious disease, Am.
Data (Big Data), IEEE, 2018, https://doi.org/10.1109/bigdata.2018.8622434 J. Infect. Contr. 46 (12) (2018) 1375–1380, https://doi.org/10.1016/j.
doi:10.1109/bigdata.2018.8622434. ajic.2018.05.019, doi:10.1016/j.ajic.2018.05.019.
[92] J. Bian, U. Topaloglu, F. Yu, Towards large-scale twitter mining for drug-related [102] H. Iso, S. Wakamiya, E. Aramaki, Conditional density estimation of tweet
adverse events, in: Proceedings of the 2012 International Workshop on Smart location: a feature-dependent approach, in: MEDINFO 2017: Precision Healthcare
Health and Wellbeing - SHB ’12, ACM Press, 2012, https://doi.org/10.1145/ through Informatics: Proceedings of the 16th World Congress on Medical and
2389707.2389713 doi:10.1145/2389707.2389713. Health Informatics, vol. 245, IOS Press, 2018, p. 408.
[93] A.A. Hamed, R. Roose, M. Branicki, A. Rubin, T-recs: time-aware twitter-based [103] M.U.G. Kraemer, D. Bisanzio, R.C. Reiner, R. Zakar, J.B. Hawkins, C.C. Freifeld, D.
drug recommender system, in: 2012 IEEE/ACM International Conference on L. Smith, S.I. Hay, J.S. Brownstein, T.A. Perkins, Inferences about spatiotemporal
Advances in Social Networks Analysis and Mining, IEEE, 2012, https://doi.org/ variation in dengue virus transmission are sensitive to assumptions about human
10.1109/asonam.2012.178 doi:10.1109/asonam.2012.178. mobility: a case study using geolocated tweets from lahore, Pakistan, EPJ Data
[94] I. Kagashe, Z. Yan, I. Suheryani, Enhancing seasonal influenza surveillance: topic Science 7 (1) (2018), https://doi.org/10.1140/epjds/s13688-018-0144-x doi:
analysis of widely used medicinal drugs using twitter data, J. Med. Internet Res. 10.1140/epjds/s13688-018-0144-x.
19 (9) (2017), https://doi.org/10.2196/jmir.7393 e315. doi:10.2196/jmir.7393. [104] L. Zhao, J. Chen, F. Chen, W. Wang, C.-T. Lu, N. Ramakrishnan, Simnest: social
[95] S. Ram, W. Zhang, M. Williams, Y. Pengetnze, Predicting asthma-related media nested epidemic simulation via online semi-supervised deep learning, in:
emergency department visits using big data, IEEE Journal of Biomedical and 2015 IEEE International Conference on Data Mining, IEEE, 2015, pp. 639–648.
Health Informatics 19 (4) (2015) 1216–1223, https://doi.org/10.1109/ [105] H. Samuel, B. Noori, S. Farazi, O. Zaiane, Context prediction in the social web
jbhi.2015.2404829, doi:10.1109/jbhi.2015.2404829. using applied machine learning: a study of canadian tweeters, in: 2018 IEEE/
[96] F.S. Lu, S. Hou, K. Baltrusaitis, M. Shah, J. Leskovec, R. Sosic, J. Hawkins, WIC/ACM International Conference on Web Intelligence (WI), IEEE, 2018,
J. Brownstein, G. Conidi, J. Gunn, J. Gray, A. Zink, M. Santillana, Accurate https://doi.org/10.1109/wi.2018.00-85 doi:10.1109/wi.2018.00-85.
influenza monitoring and forecasting using novel internet data streams: a case [106] S. Jenson, M. Reeves, M. Tomasini, R. Menezes, Mining location information from
study in the boston metropolis, JMIR Public Health and Surveillance 4 (1) (2018), users’ spatio-temporal data, in: 2017 IEEE SmartWorld, Ubiquitous Intelligence &
https://doi.org/10.2196/publichealth.8950 e4. doi:10.2196/publichealth.8950. Computing, Advanced & Trusted Computed, Scalable Computing &
[97] K. Su, Y. Xiong, L. Qi, Y. Xia, B. Li, L. Yang, Q. Li, W. Tang, X. Li, X. Ruan, S. Lu, Communications, Cloud & Big Data Computing, Internet of People and Smart City
X. Chen, C. Shen, J. Xu, L. Xu, M. Han, J. Xiao, City-wide influenza forecasting Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), IEEE, 2017,
based on multi-source data, in: 2018 IEEE International Conference on Big Data https://doi.org/10.1109/uic-atc.2017.8397519 doi:10.1109/uic-
(Big Data), IEEE, 2018, https://doi.org/10.1109/bigdata.2018.8622413 doi: atc.2017.8397519.
10.1109/bigdata.2018.8622413. [107] M.T. Pham, A. Raji�c, J.D. Greig, J.M. Sargeant, A. Papadopoulos, S.A. McEwen,
A scoping review of scoping reviews: advancing the approach and enhancing the
consistency, Res. Synth. Methods 5 (4) (2014) 371–385.
13