0% found this document useful (0 votes)
60 views38 pages

Module-I Introduction To Social Media Analytics

The document provides an overview of social media analytics and mining techniques. It discusses key topics like the importance of social media, popular platforms, challenges of social media mining, and common techniques used. The techniques include graph mining to analyze relationships and text mining to extract meaning from unstructured text. The generic process of social media mining involves data collection, cleaning, modeling, and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views38 pages

Module-I Introduction To Social Media Analytics

The document provides an overview of social media analytics and mining techniques. It discusses key topics like the importance of social media, popular platforms, challenges of social media mining, and common techniques used. The techniques include graph mining to analyze relationships and text mining to extract meaning from unstructured text. The generic process of social media mining involves data collection, cleaning, modeling, and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Module-I

Introduction to Social Media Analytics


TOPICS

• Social media and its importance

• various social media platforms

• social media mining

• challenges for social media mining

• social media mining techniques


• graph mining and text mining

• the generic process of social media mining


Social media and its importance
• Social Media is a way of communication using online tools such as Twitter,
Facebook, LinkedIn, and so on.
• Definition
A group of Internet-based applications that build on the ideological and technological foundations of
Web 2.0 and that allow the creation and exchange of user-generated content

• Social media spans lots of Internet-based platforms that facilitate human


emotions such as:
• Networking, for example, Facebook, LinkedIn, and so on
• Micro blogging, for example, Twitter, Tumblr, and so on
• Photo sharing, for example, Instagram, Flickr, and so on
• Video sharing, for example, YouTube, Vimeo, and so on
• Stack exchanging, for example, Stack Overflow, Github, and so on
• Instant messaging, for example, Whatsapp, Hike, and so on
• Today's corporate marketing departments are maturing in understanding the promise or the
impact of social media.

• In the early years, social media was perceived as yet another broadcasting medium for
publishing banner advertisements into the world.

• social media is a great tool for banner advertisements in terms of cost and reach, it's not
limited to that.

• There is another use of social media that can turn out to be more influential in the long term.

• understand current and potential customers

• opinions of consumers

• guide business decisions


• Current Customer Relationship Management (CRM) systems provides

• marketing judgments using a mixture of demographics,

• past buying patterns, and other prior actions

• Every minute of every day, Facebook, Twitter, LinkedIn, and other online
communities generate enormous amounts of this data.
Various social media platforms

• Social media is not restricted to email or chat or media sharing; it is collection of a larger
group of content generating platforms such as:
• Blogs

• Micro blogs

• Social news

• Social bookmarking

• Professional groups

• Community-based questions and answers

• Wikis
Social media mining

• social media mining is a systematic analysis of information generated from


social media.

• Data stream is a prime example of Big Data.

• The set of tools and techniques, which are used to mine such information, are
collectively called Data mining technique and in the context of social media
it's called social media mining (SMM).
• SMM can generate insights about how much someone is influencing others on
the Web.

• SMM can help businesses identify the pain points of its customer in real time

• SMMs can help us identify the potential customers based on their online
activities and based on their friend's online activities.
Research in Multiple Disciplines

• Research in multiple disciplines is based on


• Why does social media mining matter?

• If you can measure it, you can improve it

• Modeling behavior

• Predictive analysis

• Recommending content
Challenges for social media mining

• Social media mining draws its roots from many fields, such as
• statistics,

• machine learning,

• information retrieval,

• pattern recognition,

• and bioinformatics.
Important challenges
• Big Data
• Problem in Overfitting
• Large volume of data
• Sufficiency
• Limiting
• Underfitting Problem
• Noise Removal Error
• Missing of Important Information
• It is Very Tricky Business
• Evaluation dilemma
• Not getting properly annotated dataset to train Supervised Mache Learning Algorithm
• Most of the algorithms are heavily Domain Expertise.
Social media mining techniques
Graph mining
• Network graphs make up the dominant data structure and appear, essentially,
in all forms of social media data/informaion

• Graph mining can be described as the process of extracting useful knowledge


(patterns, outliers and so on.) from a social relationship between the community
members can be represented as a graph.
Facebook Graph Search.
Text mining

• Extraction of meaning from unstructured text data present in social media


is described as text mining.

• The primary targets of this type of mining are blogs and micro blogs such
as Twitter.

• It's applicable to other social networks such as Facebook that contain links
to posts, blogs, and other news articles.
The generic process of social media mining

• Any data mining activity follows some generic steps to gain some useful insights from the data.

• Example data from Twitter

• Getting authentication from the social website

• Data visualization

• Cleaning and pre-processing

• Data modeling using standard algorithms such as opinion mining, clustering, anomaly/spam

detection, correlations and segmentations, recommendations

• Result visualization
Getting authentication from the social website – OAuth 2.0

• Most social media websites provide API(Application


Programming Interface) access to their data.

• To do the mining, we (as a third-party) would need some


mechanism to get access to users' data, available on these websites.

• OAuth is used for authenticating the third party access.

• OAuth - An open protocol to allow secure authorization in a simple


and standard method from web, mobile and desktop applications.
• OAuth 2.0 provides various
methods in which different
levels of authorizations of the
various resources can reliably
be granted to the requesting
client application.
Steps for OAuth2.0 Authentication

• The client accesses the web app with the button Login via Twitter (or
Login via LinkedIn or Login via Facebook).

• This takes the client to an app, which will authenticate it.

• The client is then redirected to a redirect link via the authenticating app

• Usually, the redirect link is delivered by registering the client app with the
authenticating app.
• Using the redirect link, the client contacts the website in the client app.
Differences between OAuth and OAuth 2.0

• More flows in OAuth 2.0 to permit improved support for non-browser


based apps

• OAuth 2.0 does not need the client app to have cryptography

• OAuth 2.0 offers much less complicated signatures

• OAuth 2.0 generates short-lived access tokens, hence it is more secure

• OAuth 2.0 has a clearer segregation of roles concerning the server


responsible for handling user authorization and the server handling
OAuth requests
Data visualization R packages
• A number of visualization R packages for text data are available as R package.

• These libraries, based on available data and objective, provide various options varying
from simple clusters of words to the one inline with semantic analysis or topic modeling
of the corpus.

The simple word cloud

• One of the simplest and most frequently used visualization libraries is the simple word cloud

• The "wordcloud" R library helps the user get an understanding of weights of a word/term with
respect to the tf-idf (term frequency–inverse document frequency)matrix.

• The weights are proportional to the size and color of the word you see in the plot
Word-cloud in R

• The text mining package (tm) and the word cloud generator package (wordcloud) are available
in R for helping us to analyze texts and to quickly visualize the keywords as a word cloud.

• The 5 main steps to create word clouds in R


• Step 1: Create a text file

• Step 2 : Install and load the required packages

• Step 3 : Text mining

• Step 4 : Build a term-document matrix

• Step 5 : Generate the Word cloud

Link

http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-s
hould-know

https://towardsdatascience.com/create-a-word-cloud-with-r-bde3e7422e8a
Sentiment analysis Word cloud

• There are R packages that can generate a word cloud similar to the preceding
figure, along with the sentiments each word is representing.

• one step ahead of the basic word cloud.

• It can understanding of what kind of sentiments are present


• joy, sadness, disgust, love, and so on

• Timothy Jurka developed one such package

• The two main functions of this package.


• Classify_emotion:

• Classify_polarity
Classify_emotion:

• As the name suggests, the procedure helps the user understand the type of sentiment that is present.

• A voting-based classification is one the algorithms used in this particular procedure

• The Naive Bayes algorithm is also used for more enhanced results.

• The training dataset used on the above algorithms is from Carlo Strapparava and Alessandro Valitutti.

Classify_polarity:

• This procedure indicates the overall polarity of the emotions (positive or negative).

• This is, in a way, an extension of the procedure.

• The training data used here comes from Janyce Wiebe's subjectivity lexicon.
• The most commonly used visualization library for Facebook data is Gephi.

• Gephi is highly customizable and user-friendly library.

• Many more R packages are available to visualize most social media data

• http://rcytoscape.systemsbiology.net/versions/current/index.html

• http://cran.us.r-project.org/web/packages/sna/index.html

• http://statnetproject.org/
The most important functions in Sentiment Analysis are:
• Compute sentiment scores from contents stored in different formats with analyzeSentiment().

• If desired, convert the continuous scores to either binary sentiment classes (negative or positive) or

tertiary directions (negative, neutral or positive). This conversion can be done

with convertToBinary() or convertToDirection() respectively.
• Compare the calculated sentiment socres with a baseline (i.e. a gold standard).

Here, compareToResponse() performs a statistical evaluation,

while plotSentimentResponse() enables a visual comparison.


• Generate customized dictionaries with the help of generateDictionary() as part of an advanced

analysis. However, this prerequisites a response variable (i.e. the baseline).

Link
https://cran.r-project.org/web/packages/SentimentAnalysis/vignettes/SentimentAnalysis.html
R Packages

• R is the most popular language for Data Science.

• There are many packages and libraries provided for doing different tasks.

• dplyr and data.table for data manipulation,

• ggplot2 for data visualization and

• data cleaning library  tidyr.

• library 'Shiny' to create a Web application

• knitr for the Report generation

• mlr3, xgboost, and caret are used in Machine Learning.

Link

https://www.datacamp.com/community/tutorials/top-ten-most-important-packages-in-r-for-data-science
Preprocessing and cleaning in R
• Data Cleaning is the process of transforming raw data into consistent data that can be
analyzed

• Data cleaning may profoundly influence the statistical statements based on the data.

• R has a set of comprehensive tools that are specifically designed to clean data in an effective
and comprehensive manner.

STEP 1: Initial Exploratory Analysis


• The first step to the overall data cleaning process involves an initial exploration of the data frame that
you have just imported into R.

• It is very important to understand how you can import data into R and save it as a data frame.
STEP 2: Visual Exploratory Analysis

• There are 2 types of plots that you should use during your cleaning process –The
Histogram and the BoxPlot

1. Histogram
• The histogram is very useful in visualizing the overall distribution of a numeric column.

• We can determine if the distribution of data is normal or bi-modal or unimodal or any other
kind of distribution of interest. 

2. BoxPlot
• Boxplots are super useful because it shows you the median, along with the first, second and
third quartiles.
• BoxPlots are the best way of spotting outliers in your data frame.
STEP 3: Correcting the errors!
This step focuses on the methods that you can use to correct all the errors that you have seen.

http://dataanalyticsedge.com/2018/05/02/data-cleaning-using-r/#:~:text=Data%20Cleaning%20is%20the%20process,as
%20well%20as%20their%20reliability
.
https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf
Data Pre-Processing With Caret in R
The caret package in R provides a number of useful data transforms
• Standalone
• Transforms can be modeled from training data and applied to multiple datasets.

• The model of the transform is prepared using the preProcess() function and applied to a


dataset using the predict() function.

• Training
• Transforms can prepared and applied automatically during model evaluation.

• Transforms applied during training are prepared using the preProcess() and passed to


the train() function via the preProcess argument.
Link
https://machinelearningmastery.com/pre-process-your-dataset-in-r/
Summary of Transform Methods

• Below is a quick summary of all of the transform methods supported in the method argument of the preProcess() function in caret.

• “BoxCox“: apply a Box–Cox transform, values must be non-zero and positive.

• “YeoJohnson“: apply a Yeo-Johnson transform, like a BoxCox, but values can be negative.

• “expoTrans“: apply a power transform like BoxCox and YeoJohnson.

• “zv“: remove attributes with a zero variance (all the same value).

• “nzv“: remove attributes with a near zero variance (close to the same value).

• “center“: subtract mean from values.

• “scale“: divide values by standard deviation.

• “range“: normalize values.

• “pca“: transform data to the principal components.

• “ica“: transform data to the independent components.

• “spatialSign“: project data onto a unit circle.

You might also like