0% found this document useful (0 votes)
151 views7 pages

How Not To Predict Elections

This document discusses previous research on using social media data to predict election outcomes and identifies some challenges with this approach. It reviews claims that social media predicted past elections but finds they were not actually predictions since they were made after the elections. The document aims to test if social media metrics can accurately predict Senate election outcomes in recent US elections and proposes standards for any methodology claiming to predict elections with social media data.

Uploaded by

metadimitrios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views7 pages

How Not To Predict Elections

This document discusses previous research on using social media data to predict election outcomes and identifies some challenges with this approach. It reviews claims that social media predicted past elections but finds they were not actually predictions since they were made after the elections. The document aims to test if social media metrics can accurately predict Senate election outcomes in recent US elections and proposes standards for any methodology claiming to predict elections with social media data.

Uploaded by

metadimitrios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

How (Not) To Predict Elections

Panagiotis T. Metaxas, Eni Mustafaraj Daniel Gayo-Avello


Department of Computer Science Departamento de Informática
Wellesley College Universidad de Oviedo
Wellesley, MA, USA Oviedo, Asturias, Spain
(pmetaxas, emustafa)@wellesley.edu dani@uniovi.es

Abstract—Using social media for political discourse is increas- Being able to make predictions based on publicly available
ingly becoming common practice, especially around election time. data would have numerous benefits in areas such as health (e.g.
Arguably, one of the most interesting aspects of this trend is the predictions of flu epidemics [5], [6]), business (e.g., prediction
possibility of “pulsing” the public’s opinion in near real-time and,
thus, it has attracted the interest of many researchers as well as of box-office success of movies [7] and product marketability
news organizations. Recently, it has been reported that predicting [4]), economics (e.g., predictions on stock market trends and
electoral outcomes from social media data is feasible, in fact it is housing market trends [3], [8], [9]), and politics (e.g., trends
quite simple to compute. Positive results have been reported in a in public opinion [10]), to name a few.
few occasions, but without an analysis on what principle enables
them. This, however, should be surprising given the significant However, there have also been reports on Twitter’s ability to
differences in the demographics between likely voters and users predict with amazing accuracy the voting results in the recent
of online social networks. 2009 German elections [11] and in the 2010 US Congressional
This work aims to test the predictive power of social media elections [12]. Given the significant differences in the demo-
metrics against several Senate races of the two recent US Con- graphics between likely voters and users of social networks [1]
gressional elections. We review the findings of other researchers
and we try to duplicate their findings both in terms of data
questions arise on what is the underlying operating principle
volume and sentiment analysis. Our research aim is to shed light enabling these predictions. Could it be simply a matter of
on why predictions of electoral (or other social events) using coincidence or is there a reason why general trends are as
social media might or might not be feasible. In this paper, we accurate as specific demographics? Should we expect these
offer two conclusions and a proposal: First, we find that electoral methods to be accurate again in future elections? These are
predictions using the published research methods on Twitter
data are not better than chance. Second, we reveal some major
the questions we seek to address with our work.
challenges that limit the predictability of election results through The rest of this paper is organized as follows: The next
data from social media. We propose a set of standards that any section II reviews past research on electoral predictions using
theory aiming to predict elections (or other social events) using social media data. Section III describes a number of new
social media should follow.
experiments we conducted testing the predictability of the last
I. I NTRODUCTION two rounds of US elections based on Twitter volume and
sentiment analysis. Section IV describes a set of standards
In recent years, the use of social media for communication that any methodology of electoral predictions should follow
has dramatically increased. Research has shown that 22% of in order to be consistently competent against the statistical
adult internet users were engaged with the political campaign sampling methods employed by professional pollsters. The
on Twitter, Facebook and Myspace in the months leading final section V has our conclusions and proposes new lines
up to the November 2010 US elections [1]. Empowered by of research.
the APIs that many social media companies make available,
researchers are engaged in an effort to analyze and make
sense of the data collected through these social communication II. P REDICTING PAST E LECTIONS
channels. Theoretically, social media data, if used correctly,
can lead to predictions of events in the near future influenced In the previous section we mentioned some of the attempts
by human behavior. In fact, to describe this phenomenon, to use Twitter and Google Trends for predictions of real
[2] talk about “predicting the future” while [3] have coined world outcomes and external market events. What about the
the term “predicting the present”. In fact, researchers have important area of elections? One would expect that, following
reported that the volume of Twitter chat over time can be the previous research literature (e.g. [11], [12]), and given the
used to predict several kinds of consumer metrics such as the high utilization that the Web and online social networks have
likelihood of success of new movies before their release [2] in the US [1], Twitter volume should be have been able to
and the marketability of consumer goods [4]. These predictions predict consistently the outcomes of the US Congressional
are explained by the perceived ability of Twitter chat volume elections. Let us examine the instances and methods that
and Google Search Trends to monitor and record general social have been used in the past in the claims of electoral results
trends as they occur. predictions and discuss their predictive power.
A. Claims that Social Media Data predicted elections was rather high but it was not significant for the pre-electoral
The word “prediction” means foreseeing the outcome of polls, and they conclude that sentiment analysis on Twitter data
events that have not yet occurred. In this sense, the authors seems to be a promising field of research to replace traditional
are not aware of any publications or claims that, using social polls although, they find, it’s not quite there yet.
media data, someone was able to propose a method that would The work by [11] focuses directly on whether Twitter
predict correctly and consistently the results of elections before can serve as a predictor of electoral results. In that paper,
the elections happened. What has happened, however, is that a strong statement is made about predictability, namely that
on several occasions, post processing of social media data “the mere number of tweets mentioning a political party can
has resulted in claims that they might had been able to make be considered a plausible reflection of the vote share and
correct electoral predictions. Such claims are discussed in the its predictive power even comes close to traditional election
following subsection. polls.” In fact, they report a mean average error (MAE) of only
1.65%. Moreover, these researchers found that co-occurrence
B. Claims that Social Media Data could have predicted elec- of political party mentions accurately reflected close political
tions positions between political parties and plausible coalitions.
Probably due to the promising results achieved by many of More recently, [12] used the Tweets sent by the electoral
the projects and studies discussed in the section I, there is a candidates, not the general public, and reported success in
relatively high amount of hype surrounding the feasibility of “building a model that predicts whether a candidate will win or
predicting electoral results using social media. It must be noted lose with accuracy of 88.0%”. While this concluding statement
that most of that hype is fueled by traditional media and blogs, seems strong, a closer look in the claims reveals that they
usually bursting prior and after electoral events. For example, found their model to be less successful, as they admit that
shortly after the recent 2010 elections in the US, flamboyant “applying this technique, we correctly predict 49 out of 63
statements made it to the news media headlines. From those (77.7%) of the races”.
arguing that Twitter is not a reliable predictor (e.g. [13]) to
those claiming just the opposite, that Twitter (and Facebook) C. Claims that Social Media Data did not predict the elections
was remarkably accurate (e.g. [14]). Moreover, the degree of
accuracy of these “predictions” was usually assessed in terms The previous subsection reveals some inconsistencies with
of percentage of correctly guessed electoral races – e.g., the electoral predictions in scholarly publications. While candi-
winners of 74% for the US House and 81% for the US Senate date counts of Twitter messages predicted with remarkable
races were predicted [15] – without further qualification. Such accuracy electoral results in Germany in 2009 [11], a more
qualifications are important since a few US races are won elaborated method did not correlate well with pre-electoral
by very tight margins, while most of them are won with polls in the US 2008 Presidential elections [10]. Could it be
comfortable margins. These predictions were not compared that some of those results were just a matter of chance or the
against traditional ways of prediction, such as professional side-effect of technical problems? Who is right?
polling methods, or even trivial prediction methods based on The work by [17] focuses on the use of Google search
incumbency (the fact that those who are already in office are volume (not Twitter) as a predictor for the 2008 and 2010
far more likely to be re-elected in the US). US Congressional elections. They divided the electoral races
Compared to the media coverage, the number of scholarly in groups depending on the degree they were contested by
works on the feasibility of predicting popular opinion and the candidates, and they find that only a few groups of races
elections from social media is relatively small. Nevertheless, were “predicted” above chance using Google Trends – in one
it does tend to support a positive opinion on the predictive case achieving 81% of correct results. However, they report
power of social media as a promising line of research, while that those promising results were achieved by chance: while
exposing some caveats of the methods. Thus, according to the best group’s predictions were good in 2008 (81%), for the
[16], the number of Facebook fans for election candidates had same group the predictions were very poor in 2010 (34%).
a measurable influence on their respective vote shares. These Importantly, even when the predictions were better than
researchers assert that “social network support, on Facebook chance, they were not competent compared to the trivial
specifically, constitutes an indicator of candidate viability of method of predicting through incumbency. For example, in
significant importance [...] for both the general electorate and 2008, 91.6% of the races were won by incumbents. Even
even more so for the youngest age demographic.” in 2010, in elections with major public discontent, 84.5% of
A study of a different kind was conducted by [10]. They the races, were won by incumbents. Given that, historically,
analyzed the way in which simple sentiment analysis methods the incumbent candidate gets re-elected about 9 out of 10
could be applied to tweets as a tool of automatically pulsing times, the baseline for any competent predictor should be
public opinion. These researchers correlated the output of such the incumbent re-election rate. According to such a baseline,
a tool with the temporal evolution of different indices such as Google search volume proves to be a poor electoral predic-
the index of Consumer Sentiment, the index of Presidential tor. Compared to professional pollsters (e.g., The New York
Job Approval, and several pre-electoral polls for the US 2008 Times), the predictions were far worse; and, in some groups
Presidential Race. The correlation with the first two indices of races the predictions were even worse than chance!
In [18], the sentiment analysis methods of [10] and [11] are the names of candidates for five highly contested races for the
applied to tweets obtained during the US 2008 Presidential US Senate, 13,019 tweets were collected, contributed by 6,970
elections (Obama vs. McCain). [18] assigned a voting inten- different Twitter accounts.
tion to every individual user in the dataset, along with the These two datasets are different. The MAsen10 is an almost
user’s geographical location. Thus, electoral predictions were complete set of tweets, while USsen10 provides a random
computed for different states instead of simply the whole of sample, but because of its randomness, it should accurately
the US, and found that every method examined would have represent the volume and nature of tweets during that pre-
largely overestimated Obama’s victory, predicting (incorrectly) election week.
that Obama would have won even in Texas. In addition, [18] The first prediction method we examined is the one de-
provides some suggestions on the way in which such data scribed by [11], which consists of counting the number of
could be filtered to improve prediction accuracy. In this sense, tweets mentioning each candidate. According to that study, the
it points out that demographic bias in the user base of Twitter proportion of tweets mentioning each candidate should closely
and other social media services is an important electoral factor reflect the actual vote share in the election. Tweets containing
and, therefore, bias in data should be corrected according to the names of both candidates were not included, focusing only
user demographic profiles. on tweets mentioning one candidate at a time.
Recently, [19] provided a thorough response to the work of The second prediction method extends the ideas from [10],
[11] arguing that those authors relied on a number of arbitrary which described a way to compute a sentiment score for a
choices which make their method virtually useless for future topic being discussed on Twitter. To that end, [10] relied
elections. They point out that, by taking into account all of the on the subjectivity lexicon collected by [20] and labeled
parties running for the elections, the method by [11] would tweets containing any positive word as positive tweets, and
actually have predicted a victory for the Piratenpartei (Pirate the ones containing any negative word as negative tweets.
Party) (which received 2% of the votes but no seats in the Then, the sentiment score is defined to be the ratio between the
German parliament). number of positive and negative tweets. It must be noted that,
In this paper we decided to examine closer the claims according to [10], the number of polarized words in the tweet
of electoral predictions described in the previous subsection. is not important, and tweets can be simultaneously considered
Since we had collected data Twitter data from the US Con- as positive and negative. In addition, sentiment scores for
gressional Elections in 2010, we were in a position to examine topics with very different volumes of tweets are not easily
whether the methods proposed were as successful in instances comparable. Because of these issues, some changes had to be
other than the ones they were developed for. Moreover, we made to [10]’s approach in order to compute predicted vote
wanted to analyze why would electoral predictions using social shares. In our study, the lexicon employed is also [20], but
media may (or may not) be possible. In the next section III tweets are considered either positive or negative but not both.
we describe our computational experiments and in section IV Every tweet is labeled as positive, negative, or neutral, based
we analyze the operating models behind electoral predictions. on the sum of such labeled words (positive words contribute
+1, while negative words contribute -1). A tweet might be
III. N EW EXPERIMENTS ON T WITTER AND ELECTIONS labeled neutral when the sum of polarized words is 0, or
For our study, we used two data sets related to elections that when no contributing words appeared in it. Given the two-
took place in the US during 2010. Predictions were calculated party nature of the races, the vote share is calculated with this
based on Twitter chatter volume, as in [11], and then based on formula:
sentiment analysis of tweets, in ways similar to [10]. While pos(c1 ) + neg(c2 )
we did not have comparable data to examine the methods of vote share(c1 ) = (1)
pos(c1 ) + neg(c1 ) + pos(c2 ) + neg(c2 )
[12], we discuss some of its findings in the next section. where c1 is the candidate for whom support is being com-
The first data set we used belongs to the 2010 US Sen- puted while c2 is the opposing candidate; pos(c) and neg(c)
ate special election in Massachusetts (“MAsen10”), a highly are, respectively, the number of positive and negative tweets
contested race between Martha Coakley (D) and Scott Brown mentioning candidate c.
(R). The data set contains 234,697 tweets contributed by
56,165 different Twitter accounts, collected with the use of A. Results of Applying the Prediction Methods
Twitter streaming API, configured to retrieve near real-time For the MAsen10 data it was possible to make a more
tweets containing the names of any of the two candidates. The detailed analysis, since the data contained tweets before the
collection took place from January 13 to January 20, 2010, the election day (6 days of data), the election day (20 hours of
day after the elections. data), and post-election (10 hours of data). The 47,368 tweets
The second data set contains all the tweets provided by that mentioned both candidates were not used.
the Twitter “gardenhose” in the week from October 26 to Table I shows the number of tweets mentioning each candi-
November 1, the day before the general US Congressional date and the election results predicted from the volume. The
elections in November 2, 2010 (“USsen10”). The gardenhose total count of tweets we collected (53.25% - 46.75% in favor
provides a uniform sampling of the Twitter data. The daily of Brown) reflects closely the election outcome (Brown 51.9%
snapshots contained between 5.6 and 7.7 million tweets. Using - Coakley 47.1%). Correct prediction?
Coakley Brown POS NEG NEUT Accuracy
#tweets % #tweets % opposing Brown 124 76 150 21.71%
Pre-elec. (6 days) 52,116 53.86 44,654 46.14 opposing Coakley 70 67 105 27.68%
Elec. day (20 hrs) 21,076 49.94 21,123 50.06 supporting Brown 216 45 254 41.94%
Post-elec. (10 hrs) 14,381 29.74 33,979 70.26 supporting Coakley 240 72 213 45.71%
Total 87,573 46.75 99,756 53.25 neutral 249 82 296 47.20%
36.85%
TABLE I
T HE SHARE OF TWEETS FOR EACH CANDIDATE IN THE MA SEN 10 DATA TABLE IV
SET. N OTICE THAT THE PRE - ELECTION SHARE DIDN ’ T PREDICT THE FINAL C ONFUSION MATRIX FOR THE EVALUATION OF THE AUTOMATIC
RESULT (B ROWN WON 51.9% OF THE VOTES ). SENTIMENT ANALYSIS COMPUTED AGAINST A MANUALLY LABELED SET
OF TWEETS .

Coakley Brown
Pre-election 46.5% 53.5%
Election-day 44.25% 55.8% the sentiment analysis method. This difference was intriguing
Post-election 27.2% 72.8%
and we decided to study it closer. While a thorough evaluation
All 41.0% 59.0%
of the accuracy of sentiment analysis regarding political con-
TABLE II
P REDICTIONS BASED ON VOTE SHARE FOR M ASEN 10 DATA SET BASED ON
versation is out of the scope of this paper, some evidence on
SENTIMENT ANALYSIS . T HE PRE - ELECTION PREDICTION CORRECTLY the issues affecting simple methods based on polarity lexicons
PREDICTS B ROWN AS THE WINNER WITH A SMALL ERROR (1.1% FOR is provided from three different angles:
CORRECTED ELECTION RESULTS , ALSO SEE TABLE III).
1) Compared against manually labeled tweets: To evalu-
ate the accuracy of the above described sentiment analysis
method, a set of tweets were manually assigned to one of
the following labels: opposing Brown, opposing Coakley,
We refrained from declaring victory in the predictive power supporting Brown, supporting Coakley, or neutral. This set of
of Twitter when we realized that the share volume for the tweets was chosen to reflect “one tweet, one vote”: From the
pre-election period, actually predicted a win for Coakley, set of Twitter users that had indicated their location in the state
not Brown. Table I also shows how the number of tweets of Massachusetts, we chose users with a single tweet in the
was affected by electoral events. Brown received 1/3 of all corpus. This set contains 2,259 tweets. We read the tweets and
his mentions in the 10 hours post-election, when everyone manually assigned labels to them. Our labels were compared
started talking about his win, an important win that would against those assigned by the automatic method, producing the
have repercussions for the health care reform, a major issue confusion matrix in Table IV.
at the time. Brown’s win broke the filibuster-proof power of The results show that the accuracy of the sentiment analysis
democrats in the US Senate and produced a lot of tweets. is only 36.85%, slightly better than a classifier randomly as-
While the simple Twitter share of pre-election tweets signing the same three labels (positive, negative, and neutral).
couldn’t predict the result of the MAsen10 election, applying 2) Effect of misleading propaganda: A second evaluation
sentiment analysis to tweets and calculating the vote share was performed on a particular set of tweets, namely those in-
with Equation (1), comes close to electoral results, as shown cluded in a “Twitter bomb” targeted at Coakley [21] containing
in Table II. For a second time in our research effort we re- a series of tweets spreading misleading information about her.
frained from declaring victory in Twitter’s power in predicting The corpus used in this study contained 925 tweets that were
elections, and decided to take a closer look in our data. part of such the Twitter bomb. According to the automatic
The two prediction methods were further applied to 5 other sentiment analysis, 369 of them were positive messages, 212
highly contested senate races from the USsen10 data set. The were neutral, and only 344 were negative. While all of these
results of the 6 races are summarized in Table III. The actual tweets were part of an orchestrated smearing campaign against
results of the election don’t always sum up to 100% because Coakley, most of them were characterized as neutral or even
in a few races more than two candidates participated. So, in positive by the automatic sentiment analysis.
order to calculate the mean average error (MAE), the results Therefore, we conclude that by just relying on polarity
were normalized to sum up to 100%. Using the values of lexicons the subtleties of propaganda and disinformation are
the corrected election results, MAE values were calculated not only missed but even wrongly interpreted.
for both methods. The Twitter volume method had an error of 3) Relation to presumed political leaning: Finally, an ad-
17.1%, while the sentiment analysis had an error of 7.6%. In ditional experiment was conducted to test the assumption
other words, both MAE values are unacceptably high. Each underlying this application of sentiment analysis, namely, that
method was able to correctly predict the winner in only half the political preference of users can be derived from their
of the races. tweets. To derive the political preference from the tweets, for
every user, the corresponding tweets were grouped together
B. Sentiment Analysis Accuracy and their accumulated polarity score was attributed to the user.
The result in Table III show that while both prediction The presumed political orientation of a user was calculated
methods are correct only half of the time, MAE is smaller for following the approach described by [22]. This approach
State Senate Race Election Result Normalized Result Twitter Volume Sentiment Analysis
MA Coakley [D] vs. Brown[R] 47.1% - 51.9% 47.6% - 52.4% 53.9% - 46.1% 46.5% - 53.5%
CO Bennet [D] vs Buck [R] 48.1% - 46.4% 50.9% - 49.1% 26.3% - 73.7% 63.3% - 36.7%
NV Reid [D] vs Angle [R] 50.3% - 44.5% 53.1% - 46.9% 51.2% - 48.8% 48.4% - 51.6%
CA Boxer [D] vs Fiorina [R] 52.2% - 44.2% 54.1% - 45.9% 57.9% - 42.1% 47.8% - 52.2%
KY Conway [D] vs Paul [R] 44.3% - 55.7% 44.3% - 55.7% 4.7% - 95.3% 43.1% - 56.9%
DE Coons [D] vs O’Donnell [R] 56.6% - 40.0% 58.6% - 41.4% 32.1% - 67.9% 38.8% - 61.2%
TABLE III
T HE SUMMARY OF ELECTORAL AND PREDICTED RESULTS FOR 6 HIGHLY CONTESTED SENATE RACES . N UMBERS IN BOLD SHOW RACES WHERE THE
WINNER WAS PREDICTED CORRECTLY BY THE TECHNIQUE . B OTH T WITTER VOLUME AND S ENTIMENT A NALYSIS METHODS WERE ABLE TO PREDICT
CORRECTLY 50% OF THE RACES . I N THIS SAMPLE , INCUMBENTS WON IN ALL THE RACES THEY RUN (NV, CA, CO), AND 84.5% OF ALL 2010 RACES .

Pearson’s r
makes use of the ADA scores, which range from 0 (most Opinion on Brown vs Avg. ADA scores -0.150799848
conservative) to 100 (most liberal). ADA (Americans for Opinion on Coakley vs Avg. ADA scores +0.09304417
Democratic Action) is a liberal, political think-tank that pub- Voting orientation vs Avg. ADA scores -0.178902764
lishes scores for each member of the US Congress according TABLE V
to their voting record in key progressive issues. Official Twitter C ORRELATION BETWEEN AVERAGED ADA SCORES ( WHICH
PURPORTEDLY REFLECT USERS POLITICAL PREFERENCE ) AND THE
accounts for 210 members of the House and 68 members of the OPINIONS ON THE TWO CANDIDATES AND THE VOTING ORIENTATION . T HE
Senate were collected. Then, the Twitter followers of all these CORRELATIONS FOUND ARE CONSISTENT WITH THE INITIAL HYPOTHESES
accounts were collected, and every user received the average BUT VERY WEAK TO BE USEFUL .

ADA score of the Congress members it was following. The


number of Twitter users following the above mentioned 278
Congress members is roughly half a million. A little more than
14 thousand of them also appear in the MAsen10 dataset, and C. Could we had done better than that?
they are used in the following correlation analysis. The previous subsection reviewed how the methods pro-
For each of these 14 thousand users four different scores posed to predict elections would have performed in several
are computed: their ADA score which, purportedly, would instances using data from the 2010 US Congressional elec-
reflect their political leaning, their opinion on Brown, their tions. These experiments were important because a wider set
opinion on Coakley, and their “voting orientation” for this of test cases was needed to base any claims of predictability
particular election. The voting orientation is defined as the of elections through Social Media.
result of subtracting the opinion on Coakley from the opinion Given the unsuccessful predictions we report, one might
on Brown. Given the range of the ADA scores and the sign counter that “you would have done better if you did a different
of the rest of the scores, the correlations between them should kind of analysis”. However, recall that we did not try to invent
be as follows. The correlation between ADA score and the new techniques of analysis: We simply tried to repeat the
opinion on Brown should be negative; after all, republicans (reportedly successful) methods that others have used in the
(closer to 0 in the ADA scale) should value Brown positively, past, and we found that the results were not repeatable.
and democrats (closer to 100) should value him negatively. IV. H OW TO P REDICT E LECTIONS
The opposite should be true for Coakley and, thus, a positive
correlation should be expected. With regards to the ADA In the past, some research efforts have treated social media
score and the voting orientation they should also be negatively as a black box: it may give you the right answer, though you
correlated for the same reasons as ADA score vs opinion on may not know why. We believe that there is an opportunity for
Brown. intellectual contribution if research methods are accompanied
with at least a basic reasonable model on why they would
Table V shows the results of this experiment. The different predict correctly. Next we discuss some standards that electoral
scores do correlate as expected. However, the correlations are predictions should obey in order to be repeatedly successful.
very weak, showing that they are essentially orthogonal with
A. A method of prediction should be an algorithm.
each other.
This might seem as a trivial point, but it is not always
Based on these three experiments, we claim that the ac- easy to follow when dealing with social media. Of course,
curacy of lexicon-based sentiment analysis when applied to every election might seem different and adjustments in the data
political conversation is quite poor. When compared against collection and analysis may be necessary. Nevertheless, these
manually labeled tweets it seems to just slightly outperform adjustments should be determinable before hand, because, as
a random classifier; it fails to detect and correctly assign Duncan Watts [23] argues in his recent book, they all seem
the intent behind disinformation and misleading propaganda; obvious afterwards.
and, finally, it’s a far cry from being able to predict political More specifically, we propose that a method should clearly
preference. describe before the elections: (a) the way in which the Social
Media data are to be collected, including the dates of data D. Learn from the professional pollsters.
collection, (b) the way in which the cleanup of the data is to
be performed (e.g., the selection of keywords relevant to the This last point is not a necessary one, but it is one way
election), (c) the algorithms to be applied on the data along through which predicting elections through social media could
with their input parameters, and (d) the semantics under which work. In particular, prediction can come through correctly
the results are to be interpreted. identifying likely voters and getting an un-biased represen-
tative sample of them. That’s what professional pollsters have
The previous section observed that the currently available
been doing for the last 80 years, with mostly impressive re-
tools for analyzing large volumes of data are not always
sults, but that’s something that today’s Social Media cannot do.
accurate. Sentiment analysis can get incorrect readings of
Below we describe the complexity of professional polling and
sentiment, because the complexity of human communication
explain the reasons why their methods cannot be duplicated
cannot be easily described completely with a small set of
by unsophisticated sampling of Social Media data.
non-contradicting rules. Hoping that the errors in sentiment
analysis “somehow” cancel themselves out is not defensible. Professional polling is based on statistically reliable sam-
pling and is able to prove why it is successful. There is a long
B. Social Media Data are fundamentally different than Data history of electoral predictions, and every year significant ef-
from Natural Phenomena. fort is made all over the world to making sure that predictions
are as close to electoral results as possible. Those involved in
In particular, Social Media allow manipulation by those who this endeavor enjoy high visibility, fame when successful and
have something to gain by manipulating them. Spammers and ridicule when not successful. All the experts in the field agree,
propagandists write programs that create lots of fake accounts however, that the most important aspect of correct prediction
and use them to tweet intensively, amplifying their message, is the selection of a representative and unbiased sample of the
and polluting the data for any observer. It’s known that this population.
has happened in the past (e.g., [21], [24]). It is reasonable Professional pollsters need to obtain a random sample of
that, if the image presented by social media is important to the people who will actually vote, in order to achieve accurate
some (advertisers, spammers, propagandists), there will likely predictions. To do that, one needs both a method for random
be people who will try to tamper with it. sampling, and access to whoever the random sampling requires
This brings an important point in terms of selecting tools to sample. Since one cannot always achieve this, one has to
for analysis. Using on social media data the same analytical be strive to come as close to this requirement as possible.
tools as one would use on data from natural phenomena may Since one cannot be sure about who will actually vote, the
not result in repeatable predictions. For example, the social prediction can be approximated by sampling those who will
media metrics that post processing of candidates’ tweets found likely vote. A typical approach considers the “likely voter”, as
to increase prediction rates [12], will not likely be the same one who has voted in the previous elections. This is so because
in the next elections. The candidates in the next elections will not every adult who has the right to vote will exercise it. For
certainly manipulate their tweets in a different manner and the example, in the 2000 presidential elections, if one sampled
metrics that will increase predictability in the next elections randomly the registered voters – 80% of who actually voted,
(if at all) will be different. one would be able to make far more accurate predictions than
one that sampled just the eligible adults – 52% of whom
C. Form a testable theory on why and when it predicts. actually voted [25]. A good random sampling method should
Predicting elections with accuracy should not be supported turn out samples of equal number of people to be sampled by
without some clear understanding of why it works. If a theory age group. However, the final calculation should not include
to predict elections is to be identified, the research should be the sample results of each age group without age adjustment.
able to explain why this is the case in a testable way, and not This is because in 2000, only 36% of citizens between the
treat it as a black box. ages 18 and 24 voted compared to 50% of those between 25
Related to this point is the establishment of a baseline for and 34 and 68% of those over 35.
successful predictions. A success rate for elections that is close Consider then the unfiltered sample which can be obtained
to chance is not an appropriate baseline, since they are trivial today from social media data, such as those provided by Twit-
ways of prediction that are much better than that. For example, ter, Facebook, Myspace or other popular social networking
in 2008, incumbents won 91.6% of the races they run, and in services. To be comparable with the results of professional
2010, at a time of reportedly major upsets, the incumbents pollsters, a correct sample from Twitter should be able to
still won in 84.55% of the races they run. Since in the US identify the age range, voting eligibility and prior voting
congressional elections about nine out of ten of the times the pattern of the tweeters. However, there are currently no means
incumbent wins, incumbency success rate is an appropriate of collecting this information reliably, at least without intrusive
baseline (as also [12], [17] propose). Similarly, many electoral methods that compromise privacy. But even then, a really
districts are known to be consistently electing candidates from random sample of the likely voters is still unattainable, because
the same party for years. Predictions performing below these only those who have an active Twitter account and have
trivial baselines should not be considered competent. decided to tweet about the election can be observed. Collecting
social media data today, is like going to a political rally R EFERENCES
and sampling the people gathered there, expecting that it will [1] A. Smith, “Twitter and social networking in the 2010 midterm elections,”
provide an accurate representation of the likely voters. Instead, Pew Research, 2011, http://bit.ly/heGpQX.
a highly biased sample will be found. It would not help much [2] S. Asur and B. A. Huberman, “Predicting the future with social media,”
CoRR, vol. abs/1003.5699, 2010, http://arxiv.org/abs/1003.5699.
to go to every political rally, because the large volume of voters [3] H. Choi and H. Varian, “Predicting the present with google trends,”
who attend no rally will still be missing. Official Google Research Blog, 2009, http://bit.ly/h9RRdW.
[4] Y. Shimshoni, N. Efron, and Y. Matias, “On the predictability of search
trends,” Google Research Blog, 2009, http://doiop.com/googletrends.
V. C ONCLUSIONS [5] J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolin-
ski, and L. Brilliant, “Detecting influenza epidemics using search
engine query data,” Nature, vol. 457, no. 7232, pp. 1012–4, 2009,
This research has revealed that data from social media did http://1.usa.gov/gEHbtH.
only slightly better than chance in predicting election results [6] V. Lampos, T. D. Bie, and N. Cristianini, “Flu detector - tracking
epidemics on twitter,” Machine Learning and Knowledge, vol. 6323,
in the last US Congressional elections. We argue that this pp. 599–602, 2010.
makes sense: So far, only a very rough estimation on the [7] G. Mishne, “Predicting movie sales from blogger sentiment,” in In AAAI
exact demographics of the people discussing elections in social 2006 Spring Symposium on Computational Approaches to Analysing
Weblogs (AAAI-CAAW), 2006.
media is known, while according to the state-of-the-art polling [8] J. Bollen, H. Mao, and X.-J. Zeng, “Twitter mood predicts the stock
techniques, correct predictions requires the ability of sampling market,” Journal of Computational Science, vol. 2, no. 1, pp. 1–8,
likely voters randomly and without bias. Moreover, answers 03/2011 2011.
[9] E. Gilbert and K. Karahalios, “Widespread worry and the stock market,”
to several pertinent questions are needed, such as the actual in Proc. of 4th ICWSM, 2010. [Online]. Available: http://bit.ly/qoz4lh
nature of political conversation in social media, the relation [10] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith,
between political conversation and electoral outcomes, and the “From tweets to polls: Linking text sentiment to public opinion time
series,” in Proc. of 4th ICWSM. AAAI Press, 2010, pp. 122–129.
way in which different ideological groups and activists engage [11] A. Tumasjan, T. Sprenger, P. G. Sandner, and I. M. Welpe, “Predict-
and influence online social networks. ing elections with twitter: What 140 characters reveal about political
In this paper we have also described three necessary stan- sentiment,” in Proc. of 4th ICWSM. AAAI Press, 2010, pp. 178–185.
[12] A. Livne, M. Simmons, E. Adar, and L. Adamic, “The party is over
dards that any theory aiming to predict competently and here: Structure and content in the 2010 election,” in Proc. of 5th
consistently elections using Social Media data should follow: ICWSM, 2011. [Online]. Available: http://bit.ly/q9lSug
The prediction theory should be an algorithm with carefully [13] P. Goldstein and J. Rainey, “The 2010 elections: Twitter isn’t a very
reliable prediction tool,” LA Times Blog, 2010, http://lat.ms/fSXqZW.
predetermined parameters, the data analysis should be aware [14] A. Carr, “Facebook, twitter election results prove remarkably accurate,”
of the difference between social media data and natural Fast Company, 2010, http://bit.ly/dW5gxo.
phenomena data, and it should contain some explanation on [15] Facebook, “The day after election day (press release),” Facebook Notes,
2010, http://on.fb.me/hNcIgZ.
why it works. We argue that one way to do that, would be [16] C. B. Williams and G. J. Gulati, “The political impact of facebook: Ev-
to establish a sampling method comparable to the ones used idence from the 2006 midterm elections and 2008 nomination contest,”
by professional pollsters, though there are many obstacles in Politics & Technology Review, vol. 1, pp. 11–21, 2008.
[17] C. Lui, P. T. Metaxas, and E. Mustafaraj, “On the predictability of the
doing so today. u.s. elections through search volume activity,” in e-Society Conference,
In addition to that, further research is needed regarding the 2011, http://bit.ly/gJ6t8j.
flaws of simple sentiment analysis methods when applied to [18] D. Gayo-Avello, “A warning against converting social media into the
next literary digest,” in CACM (to appear), 2011.
political conversation. In this sense it would be very interesting [19] A. Jungherr, P. Jürgens, and H. Schoen, “Why the pirate party won
to understand the impact of different lexicons and to go one the german election of 2009 or the trouble with predictions: A re-
step further by using machine learning techniques (such as sponse to “predicting elections with twitter: What 140 characters reveal
about political sentiment”,” Social Science Computer Review, 2011,
in the work by [2]). Also, there is a need for a deeper http://bit.ly/nQU4Zx.
understanding of the dynamics of political conversation in [20] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual
social media (following the work of [26]). polarity in phrase-level sentiment analysis,” in Proc. of Human
Lang. Tech. and Empir. Meth. in NLP, ser. HLT ’05.
Finally, we point out that our results do not argue against Stroudsburg, PA, USA: ACL, 2005, pp. 347–354. [Online]. Available:
having a strategy for involving social media in a candidate’s http://dx.doi.org/10.3115/1220575.1220619
[21] P. T. Metaxas and E. Mustafaraj, “From obscurity to prominence in
election campaign. Instead, it argues that, just because a candi- minutes: Political speech and real-time search,” in WebSci10, 2010.
date is scoring high in some social media metrics (e.g., number [Online]. Available: http://bit.ly/h3Mfld
of Facebook friends or Twitter followers), this performance [22] J. Golbeck and D. L. Hansen, “Computing political preference among
twitter followers,” in Proc. of Human Factors in Comp. Sys., 2011.
does not guarantees electoral success. [23] D. Watts, Everything Is Obvious: Once You Know the Answer. Crown
Publishing Group, 2011. [Online]. Available: http://bit.ly/q2cUT6
[24] E. Mustafaraj, S. Finn, C. Whitlock, and P. Metaxas, “Vocal minority
ACKNOWLEDGMENT versus silent majority: Discovering the opionions of the long tail,” in
Proc. of IEEE SocialCom, 2011.
The Twitter data for the November election was courtesy of [25] M. Blumenthal, “The why and how of likely voters,” Online Blog, 2004,
http://bit.ly/dQ21Xj.
the Center for Complex Networks and Systems Research at the [26] S. Somasundaran and J. Wiebe, “Recognizing stances in ideological on-
Indiana University School of Informatics and Computing. The line debates,” in CAAGET ’10, 2010.
work of P. Metaxas and E. Mustafaraj was partially supported
by NSF grant CNS-1117693..

You might also like