Using social media and personality traits to assess software developers’ emotional polarity

View article
PeerJ Computer Science

Main article text

 

Introduction

  • Proposes a new approach to evaluate workers’ sentiment polarity during extended periods without interfering with professional tasks, opening the possibility of monitoring emotions polarity in real-world professional environments;

  • Evaluates the proposed approach using software developers as a case study since the software industry is a relevant activity sector that relies on complex human activities that are highly dependent on developers’ emotions;

  • Considers different possibilities for implementing the proposed approach by evaluating the accuracy of different unsupervised sentiment analysis methods to classify the sentiment polarity of software developers’ posts on Twitter. The classification obtained by the sentiment analysis methods was compared with the manual classification of a sample of posts evaluated manually by a team of expert psychologists (used as reference) and showed that the best methods could classify the polarity of posts with a macro F1-Score of 0.745 and an accuracy of 0.768;

  • Benchmarks five lexicon-based sentiment analysis methods and ensembles in a total of 31 combinations of methods and identifies and ranks the best methods and ensembles, guiding the best alternative to assess emotions of software developers using non-intrusive data analysis from social media;

  • Shows that developers’ posts polarity on Twitter during and outside working periods could change substantially for some developers, demonstrating that the proposed approach of assessing developers’ emotions from social media data is not only non-intrusive but also covers the entire period of working and non-working time;

  • Proposes the use of the Big Five dimensions of personality as a factor to weight the tweets polarities assessed by lexicon-based sentiment analysis methods to support the identification of abnormal negative or positive periods that may influence software development;

  • Makes available for public access the anonymized dataset used in this study, including answers to surveys, ethics committee documents, the results of analysis, all generated charts, and data analysis in the companion data (available at https://doi.org/10.5281/zenodo.7846996).

Background and state of the art

Sentiment and emotions

Personality traits

Text sentiment analysis

  • SentiStrength (Thelwall, Buckley & Paltoglou, 2012): a well-known sentiment analysis method that uses a lexical dictionary labeled by humans enhanced by machine learning. This method used an expanded version of the LIWC dictionary, adding new characteristics for the context of social media;

  • Sentilex-PT (Carvalho & Silva, 2015): a sentiment lexicon specifically designed for the sentiment and opinion analysis about human entities in texts written in Portuguese, consisting of 7,014 lemmas and 82,347 inflected forms;

  • Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2015): aims to analyze texts to detect emotional, social, cognitive words and standard linguistic dimensions of texts. Although LIWC has several metrics, we employed only those related to emotional polarities in this study;

  • VADER (Hutto & Gilbert, 2014): a gold standard lexicon, and rule-based sentiment analysis tool claimed to perform exceptionally well in a social media context;

  • OpLexicon (Souza et al., 2011): a sentiment lexicon for the Portuguese language built using multiple sources of information. The lexicon has around 15,000 polarized words classified by their morphological category, annotated with positive, negative, and neutral polarities.

State of the art

Proposed approach and implementation details

  • Our approach is non-intrusive. Using social media posts as a data source to assess sentiment polarity does not cause any form of disturbance, interruption, or deviation from the professional activities of software developers. Thus, our approach takes advantage of social media posts without introducing new activities that may disturb professional tasks;

  • Our approach allows to look back in time. Users posts their social media content deliberately, each at a specific frequency of activity, generating a huge amount of data from the day they register on a social media platform. Thus, we can look back in time at users’ activities, perform sentiment analysis polarity classification and understand their sentiment patterns day-by-day, using this valuable information to suggest software development improvements;

  • Our approach is non-inductive. We consider social media posts that users have already written or the ones they will ordinarily write. We do not ask for or force any post at a specific time or context. What we get from social media is what users deliberately post;

  • We use consolidated methods. Our approach employs five traditional lexicon-based methods of sentiment analysis and creates 31 combinations of them to analyze the dataset;

  • We use open-context data. Despite social media posts might not be, in general, related to software engineering, their polarities affect professional activities (Weiss & Cropanzano, 1996), once these posts’ polarities are a result of what users’ live, face, and how they react. We analyzed users’ open-context posts from an open-context platform and did not apply any restrictions related to the text content or context.

Experimental study

Participants

  • Having a profile completely open, with explicit location and direct message enabled;

  • Having at least one tweet per day, considering the study period;

  • Being a Brazilian software developer that lives in Brazil;

  • Having posts mainly in the Brazilian Portuguese language.

Evaluators

Tweets dataset

Lexicons

Ethics and privacy

Results and discussion

Manual analysis

  • Only tweets with texts were considered;

  • Ignore external links;

  • Consider tweets’ emotional polarity (positive, negative, or neutral) according to tweets’ writing content;

  • Tweets with factual information are considered neutral;

  • Classify tweets with mixed sentiments as positive or negative based on the most relevant sentiment;

  • Considering positive and negative emphasis, such as emojis, punctuation, and capital letters;

Lexicon analysis (RQ1)

Personality traits correlation (RQ2)

Working vs non-working tweets (RQ3)

The approach in a software engineering environment

  • Displays a developer’s emotional state status by setting a threshold and visually highlighting developers that require attention from software managers. For instance, the dashboard shows the developers whose emotional polarities deviate significantly from their baseline or those with consistently negative sentiments. This could provide software managers with an at-a-glance view of which developers may need extra support or attention;

  • Shows the team’s emotional sentiment polarity baseline, calculated as the mean of the software developers’ emotional polarities over a given period. This allows software managers to have a general understanding of the team’s overall emotional sentiment and gain valuable insight into the team’s emotional state. Software managers can then use this information to identify any correlations with team productivity or quality and take appropriate actions to improve it;

  • Provides a detailed breakdown of tweets and their sentiment polarities over a given period. The tool highlights specific areas, divided into positive and negative. The negative area denotes a prolonged period of consecutive days or weeks, indicating a negative trend in emotional sentiment polarity for a particular developer. Conversely, the positive area indicates a prolonged period of consecutive days or weeks displaying a positive trend for that developer. This can give software managers a clear understanding of developers’ emotional polarities and take action accordingly.

Threats to validity

Conclusion and future work

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Leo Silva conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, and approved the final draft.

Marília Gurgel de Castro analyzed the data, prepared figures and/or tables, and approved the final draft.

Miriam Bernardino Silva analyzed the data, prepared figures and/or tables, and approved the final draft.

Milena Santos analyzed the data, prepared figures and/or tables, and approved the final draft.

Uirá Kulesza analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Margarida Lima conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Henrique Madeira conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Ethics

The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):

The Research Ethics and Deontology Committee of the Faculty of Psychology of the University of Coimbra unanimously approved the study.

Data Availability

The following information was supplied regarding data availability:

The source code is available at GitHub and Zenodo:

https://github.com/leosilva/peerj_computer_science_2022/releases/tag/v1.0.

Leo Silva. (2023). Source code for article “Using social media and personality traits to assess software developers’ emotional polarity”. Zenodo. https://doi.org/10.5281/zenodo.7864671.

The data is available at Zenodo:

Leo Silva, Marília Gurgel de Castro, Miriam Bernardino Silva, Milena Santos, Uirá Kulesza, Margarida Lima, & Henrique Madeira. (2023). Using social media and personality traits to assess software developers' emotional polarity [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7846996.

Funding

This work was supported by the Grant CISUCUID/CEC/00326/2020, funded by the European Social Fund, through the Regional Operational Program Centro 2020. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1,348 Visitors 1,488 Views 63 Downloads