0% found this document useful (0 votes)

103 views10 pages

Google Play Store Apps-Data Analysis and Ratings Prediction

aaadskdnksa sadns,anfjksaf fsa,mfaddddddddd dmnfdnf

Uploaded by

Anubhav Nath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views10 pages

Google Play Store Apps-Data Analysis and Ratings Prediction

aaadskdnksa sadns,anfjksaf fsa,mfaddddddddd dmnfdnf

Uploaded by

Anubhav Nath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

Google Play Store Apps- Data Analysis and Ratings Prediction

S Shashank1, Brahma Naidu2
1Student ICFAI Tech, Hyderabad
2Professor, Dept. of Computer Science, ICFAI Tech, Hyderabad, Telangana, India.
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Google play store is engulfed with a few thousands 1.1 Analysis and Prediction
of new applications regularly with a progressively huge
number of designers working freely or on the other hand in a In today’s scenario we can see that mobile apps playing
group to make them successful, with the enormous challenge an important role in any individual’s life. It has been seen that
from everywhere throughout the globe. Since most Play Store the development of the mobile application advertise has an
applications are free, the income model is very obscure and incredible effect on advanced innovation. Having said that,
inaccessible regarding how the in-application buys, adverts with the consistently developing versatile application
and memberships add to the achievement of an application. In showcase there is additionally an eminent ascent of portable
this way, an application's prosperity is normally dictated by application designers inevitably bringing about high as can be
the quantity of installation of the application and the client income by the worldwide portable application industry.
appraisals that it has gotten over its lifetime instead of the
income is created. Application (App) ratings are feedback With enormous challenge from everywhere throughout
provided voluntarily by users and function important the globe, it is basic for a designer to realize that he is
evaluation criteria for apps. However, these ratings can often continuing in the right heading. To hold this income and their
be biased due to insufficient or missing votes. Additionally, place in the market the application designers may need to
significant differences are observed between numeric ratings figure out how to stick into their present position. The Google
and user reviews. This Study aims to predict the ratings of Play Store is observed to be the biggest application platform.
Google Play Store apps using machine learning Algorithms. I It has been seen that in spite of the fact that it creates more
have tried to perform Data Analysis and prediction into the than two fold the downloads than the Apple App Store yet
Google Play store application dataset that I have collected makes just a large portion of the cash contrasted with the
from Kaggle. Using Machine Learning Algorithms, I have tried App Store. In this way, I scratched information from the Play
to discover the relationships among various attributes present Store to direct our examination on it.
in my dataset such as which application is free or paid, about
With the fast development of advanced cells, portable
the user reviews, rating of the application.
applications (Mobile Apps) have turned out to be basic pieces
Key Words: Google Play Store Apps, Ratings Prediction, of our lives. Be that as it may, it is troublesome for us to
Exploratory Data Analysis, Machine Learning. follow along the fact and to understand everything about the
apps as new applications are entering market each day. It is
1. INTRODUCTION accounted for that Android1market achieved a large portion
of a million applications in September 2011. Starting at now,
Machine learning approaches are essential for us to take care 0.675 million Android applications are accessible on Google
of numerous issues. In this paper, we present machine Play App Store. Such a lot of applications are by all accounts
learning models and structures in detail. Machine learning an extraordinary open door for clients to purchase from a
has numerous applications in numerous perspectives and has wide determination extend. We trust versatile application
incredible advancement potential. clients consider online application surveys as a noteworthy
impact for paid applications. It is trying for a potential client
In future, it is predictable that machine learning could to peruse all the literary remarks and rating to settle on a
set up ideal speculations to clarify its exhibitions. In the choice. Additionally, application engineers experience issues
meantime, its capacities of unsupervised learning will be in discovering how to improve the application execution
improved since there is much information on the planet dependent on generally speaking evaluations alone and
however it isn't relevant to add names to every one of them. would profit by understanding the a huge number of printed
It is additionally anticipated that neural system structures remarks.
will turn out to be increasingly unpredictable with the goal
that they can separate all the more semantically important 1.2 Google Play store Dataset
highlights. In addition, profound learning will consolidate
with support adapting better and we can utilize this points of The dataset consists of Google play store application
interest to achieve more assignments. and is taken from Kaggle, which is the world’s largest
community for data scientists to explore, analyze and share
data.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 265
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

This dataset is for Web scratched information of 10k Play 1.5.2 Unsupervised Learning
Store applications to analyze the market of android. Here it is
a downloaded dataset which a user can use to examine the In the Unsupervised learning we do not train
Android market of different use of classifications music, our machine according to the present data or input.
camera etc. With the assistance of this, client can predict see It means there is not any supervisor as a teacher in
whether any given application will get lower or higher rating this learning. In this we allow algorithm to work on
level. This dataset can be moreover used for future references their own without any training or guidance. Here
for the proposal of any application. Additionally, the the main working of the machine is that it works on
disconnected dataset is picked so as to choose the estimate some definite patterns, similarities in the given
exactly as online data gets revived all around a great part of dataset without any training or proper guidance.
the time. With the assistance of this dataset I will examine Therefore machine is restricted to find out the
various qualities like rating, free or paid and so forth utilizing structure which is hidden in the given dataset.
Hive and after that I will likewise do forecast of various traits
like client surveys, rating etc. 1.5.3 Semi-supervised Learning

1.3 Data Mining This type of learning lies between the above two
learning methods.
Data mining is that the process of rummaging through a
knowledge set and finding correlations, anomalies and or 1.6 Neural Networks
patterns which will be of usefulness. In other words, it's
We can divide neural network into different forms such as
having an outsized dataset filled with scattered information
artificial neural network, deep neural network, recurrent
and trying to form sense of it by finding meaningfulness.
neural network, convolutional deep neural networks. Each
1.4 Python form has its own importance and its own features. In neural
network we have input layer, no of hidden layers, and output
Most of the info scientist use python due to the good layer.
built-in library functions and therefore the decent
community. Python now has 70,000 libraries. Python is
simplest programing language to select up compared to
other language. That’s the most reason data scientists use
python more often, for machine learning and data processing
data analyst want to use some language which is
straightforward to use. That’s one among the most reasons
to use python. Specifically, for data scientist the foremost
popular data inbuilt open source library is named panda. As
we've seen earlier in our previous assignment once we got to
plot scatterplot, heat maps, graphs, 3-dimensional data
python built-in library comes very helpful.
Fig -1: Neural Network
1.5 Machine Learning
1.6.1 Deep Neural Network
Machine learning is an application of AI (AI) that gives
A deep neural network is defined as a neural network
systems the power to automatically learn and improve from
which contains certain level of complexity, like a neural
experience without being explicitly programmed. Machine
network which contains more than two layers. In the deep
learning focuses on the event of computer programs which
neural network we use some mathematical model to solve
will access data and use it learn for themselves
any model in a proper way using all the complexities.
1.5.1 Supervised Learning
A neural system, when all is said in done, is an innovation
It is defined as a learning in which we train a worked to reproduce the action of the human brain –
machine as per our dataset or input. From that explicitly, design acknowledgment and the section of
point forward, the machine is furnished with contribution through different layers of deep neural
another arrangement of examples (data) so associations.
supervised learning analyses the provided data
In DNN, data flows forward it means from input layer to
(set of preparing models) and creates a right
output layer without having any loopholes. At first, the DNN
result from given input.
makes a guide of virtual neurons and allots irregular
numerical qualities, or "loads", to create link between them.
The loads and data sources are increased and return a yield
somewhere in the range of 0 and 1. On the off chance that the

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 266
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

system didn't precisely perceive a specific example, a in text. For example, some authors [14] proposed a
calculation would alter the loads. That way the calculation scheme for annotating a low-level representation of
can make certain parameters progressively opinions within a text. Additionally, they described
an opinion-oriented “scenario template” that
summarizes the opinions expressed in a document.
This approach is helpful for tasks that involve posing
question from multiple perspectives.

- Other authors [15] suggested adopting a statistical

analysis based on a spin model, to extract the
semantic orientations of words. Mean field
approximations were used to compute the
approximate probability in the spin model. Semantic
orientations are then evaluated as desirable or
undesirable. A smaller number of seed words for the
proposed model produce highly accurate semantic
orientations based on the English lexicon.

- Various sentiment analysis methods have been

performed to summarize the ensembles of
comments and reviews [16]. These methods use
Fig -2: Neural Network Diagram mathematical and statistical methods (especially
involving Gaussian distributions) to overcome the
2. LITERATURE SURVEY problems encountered in sentiment analysis.
Although these authors proposed a model, it was not
There has been a constant growth in the public and implemented.
private information stored within the internet. This includes
textual data expressing people's opinions on review sites, - A recent study [17] investigated the application of a
forums, blogs, and other social media platforms. Review- machine learning algorithm to a dataset covering, for
based prediction systems allow this unstructured example, the app category, the numbers of reviews
information to be automatically transformed into structured and downloads, the size, type, and Android version
data reflecting public opinion. These structured data can be of an app, and the content rating, to predict a Google
used subsequently as a measure of users' sentiments about app ranking. Decision trees, linear regression,
specific applications, products, services, and brands. They can logistic regression, support-vector machine, NB
hence provide important information for product and classifiers, k-means clustering, k-nearest neighbors,
services refinement. This kind of sentiment analysis was and artificial neural networks were studied for that
conducted in the following studies. purpose.

- Kumari and other researchers [8,9,10,5] used the - App ratings have been predicted based on the
Naïve Bayes (NB) classifier to classify opinions as features provided for app [18,19]. Experiments were
positive, negative, or neutral. performed on the BlackBerry World and Samsung
Android stores to collect the raw features provided
- Wang and others [11] argued that a rating is not for the apps, including their price, rank of
entirely determined by a review content. For downloads, ratings, and textual descriptions. The
example, a user may well intend to give a positive features were then encoded into a numerical vector
review by employing positive words, and yet issue a to be used in case-based reasoning and to predict the
comparatively lower rating. app rating.
- Dave and others [12] proposed a method for - In contrast to the above-cited studies, other authors
extracting the polarity in user reviews of products, [20] investigated the nature of sentiments expressed
expressed as poor, mixed, or good. The classifier in Google app reviews. Their study measured
used was Naïve Bayes (NB). opinions and sentiments represented in user
reviews through a variety 4 | UMER et al. of emojis
- According to Pang et al [13], although machine
expressing, for example, negativity, positivity, anger,
learning approaches perform far better for
or excitement. It evaluated whether those
traditional topic-based categorization, they're less
sentiments are informative for the purpose of app
successful for sentiment analysis.
development and refinement.
- Information-extraction technologies have also been
explored to identify and organize opinions contained
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 267
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

However, the above studies are unsatisfactory in various

respects and are unsuitable for predicting numeric ratings of
Google apps. First, text-mining techniques are ineffective
when applied to app reviews, as it has Unicode supported
language with a limited number of words. Second, those
studies are based either on rating predictions made using
inherent app features or on external features (eg, price, bug
report, etc.). None of those studies investigated the possible
discrepancies between users' numeric ratings and reviews.
To our knowledge, this study is the first to investigate such
discrepancies and to base numeric-rating predictions for
Google apps.

3. EXPLORATORY DATA ANALYSIS Fig -4: Updated Apps

3.1 Free vs Paid 3.3 Updated Free Apps

In this data almost 50% apps are added or updated on the

month of July, 25% of apps are updated or added on the
month of August and rest of 25% remaining months.

Fig -3: Free vs Paid

Here we can see that 92.6% apps are free and 7.38% Fig -5: Updated Free Apps
apps are paid on Google Play Store, so we can say that Most of
the apps are free on Google Play Store. 3.4 Updated Paid Apps

3.2 Updated Apps Same as free apps most of the paid apps too updates in the
month of July.
In the below plot, we plotted the apps updated or added
over the years comparing Free vs. Paid, by observing this plot
we can conclude that before 2011 there were no paid apps,
but with the years passing free apps has been added more in
comparison to paid apps, By comparing the apps updated or
added in the year 2011 and 2018 free apps are increases
from 80% to 96% and paid apps are goes from 20% to 4%. So
we can conclude that most of the people are after free
apps

Fig -6: Updated Paid Apps

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 268
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

3.5 Ratings 3.8 Ratings of Free vs Paid Apps

Most Number of ratings which got on Google Play Store is Free apps are the most rated apps on the Google Play Store
given for free apps. compared to Paid Apps

Fig -10: Free vs Paid App Ratings

3.9 Free Apps Ratings

Fig -7: Ratings

3.6 Free App content Rating

Fig -11: Free App Ratings

3.10 Paid App Ratings

Most of the paid apps on the app store are rated 4.2 to 4.8
Fig -8: Free App Rating

3.7 Paid App Content Rating

Fig -12: Paid App Ratings

3.11 Category of Apps

From the below chart we can find that most of the apps which
Fig -9: Paid App Ratings are on Google Play Store belong to Family, Gamming and
Tools.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 269
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

3.14 Content Rating

The apps which are available for everyone are having the
ratings 4 and above out of 5.

Fig -13: Paid App Ratings

3.12 Android Version

Most of the apps in Google Play store are of Android version Fig -16: Paid App Ratings
4.1 and up.
3.15 Ratings over the Android Version

The Android version 4.1 and above have the ratings 4 and
above.

Fig -14: Paid App Ratings

3.13 Number of installations

Fig -17: Paid App Ratings
From the below plot highest installs of the apps are crossing a
Million and then 10 Million, very less apps are crossing the 3.16 Category wise Rating
500M and dream install 1B . Some apps like Instagram,
YouTube, Facebook, WhatsApp, etc. are crossing the dream The Family, Game and Tools Category has got the highest
install 1B. ratings i.e. 4 and above.

Fig -15: Paid App Ratings

Fig -18: Paid App Ratings

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 270
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

3.17 Ratings over Installations 4.3 DATA PREPARATION

The apps which has got the 1M and 10M installations has got 4.3.1 PRE PROCESSING
the ratings 4 and above.
Preprocessing is important into transitioning raw data into a
more desirable format. Undergoing the preprocessing
process can help with completeness and compellability. For
instance, you'll see if certain values were recorded or not.
Also, you'll see how trustable the info is. It could also help
with finding how consistent the values are. We need
preprocessing because most real-world data are dirty. Data
can be noisy i.e. the data can contain outliers or simply errors
generally. Data can also be incomplete i.e. there can be some
missing values.

4.3.2 FEATURE SELECTION

Training a supervised machine learning algorithm requires

Fig -19: Paid App Ratings textual documents to be represented in vectorial form. For
this purpose, textual data must be converted into numbers
4. DATA PREPROCESSING without losing information.
The dataset collected from the Google Play store is semi
4.3.3 NORMALIZATION
structured or unstructured and contains significant
superfluous data (defined as not contributing significantly to The terms normalization and standardization are sometimes
the prediction process). Since large datasets require longer used interchangeably, but they typically ask various things.
training times, and because “stop words” reduce the Normalization usually means to scale a variable to possess a
prediction accuracy, text preprocessing is therefore required worth between 0 and 1, while standardization transforms
to overcome this limitation. Preprocessing involves various data to possess a mean of zero and a typical deviation of 1.
tasks including stemming, lowercase conversion, Normalization makes training less sensitive to the size of
punctuation, and excluding terms. features, so we will better solve for coefficients.
Standardizing tends to form the training process well
4.1 DESCRIPTION OF DATA behaved because the numerical condition of the optimization
problems is improved. We didn't normalize or standardize
Attribute Description
this dataset, it had been not necessary.
App Application 4.4 STANDARD DEVIATION
Category Category of the Application
Rating Overall user rating of the app The standard deviation is square of variance. That’s one
Reviews Number of user reviews for among the ways to live the info. To calculate the quality
the app deviation is first find the mean. Then each number are going
Size Size of the App to be subtracted from mean. Then take the mean of these
Installs Number of user squared differences. Then take the squared difference then
downloads/Installs we'll be done.
Type Paid or Free Below these plots can help us learn more about our data.
Price Price of the app
Content Rating Age group of app -
Children/Adult
Genres App’s Genre
Table -1: Attributes

4.2 SOFTWARE

We Use Anaconda Navigator to launch Jupyter Notebook.

Then we choose Python 3 for coding.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 271
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

only numeric values are used so the importances of these

variables are computed with the rating. As SVR is usually
used for continuous numeric data, this model is applied only
on the numeric variables so the importance of those variables
can be find out.

5.3 Linear Regression

Linear Regression model is also used to find the variable

importance of different variables with ratings. Although
linear regression model is a simple regression model, it
sometimes produces better results than other complex
models. In this model, only numeric values are used.
Therefore, when this model is applied to the dataset, only the
numeric variables are considered.

5.4 K Nearest Neighbors

KNN is easiest supervised machine learning algorithm. It’s

the foremost basic machine learning algorithm you'll find on
scikit-learn. We will use KNN solve complicated problems.
With the assistance of KNN we will do pattern recognition
and data processing. KNN defines the similarity. From the
Fig -20: Whole Data Charts given dataset KNN finds common groups between attributes.
We split the info into training and test set. Then we will see
4.5 CORRELATION MATRIX what proportion similarity it becomes on the result.

5.5 K Means Clustering

K-Means Clustering Algorithm works by comparing or using

similarity for every datum. This is often commonly used with
Unsupervised Learning. The K term means the amount of
groups clustered. Then we get the results of the centers of the
clusters. Then we will plot this also as our original clustered
data. For instance, K-Means you ought to be an expert in your
data. This mean clustering numbers should be known before
you run your algorithm. K-Means works when the info is in
additional of a flipped cone shape.

Algorithm Accuracy
Fig -21: Correlation Matrix Random Forest 73.55%
SVR 76.49%
5. ALGORITHMS Linear Regression 72,45%
K- Nearest Neighbor 92.22%
5.1 Random Forest
K-Means Clustering 69.56%
Random forest regression is applied to all the variables the Table -2: Accuracy of Algorithms
results of random forest determine the importance of all the
variable and their influence on the rating. The results of 6. CONCLUSIONS
random forest regression are evaluated using Mean Square
After undergoing these algorithms and process, we
Error. Random forest model is the first model that is applied
concluded that our hypothesis is true. Meaning you can
to the dataset and the results of Random forest classification
predict the app ratings, however significant
are computed for a number of variables to find the
preprocessing must be done before you start the
importance of these variables.
classification and regression processes.
5.2 Support Vector Regression
The Play Store apps data has enormous potential to
As Support Vector Regression (SVR) is a promising drive app-making businesses to success. Actionable
regression model for continuous variables, it is used to find insights can be drawn for developers to work on and
the importance of all the numeric variables. In this model, capture the Android market! This shows that given the

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 272
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

Size, Type, Price, Content Rating, and Genre of an app, Proc. Int. Conf.- Cloud Syst. Big Data Eng. (Noida, India),
we can predict about 92% accuracy if an app will have Jan. 2016, pp. 320–325.
more than 100,000 installs and be a hit on the Google
Play Store. [9] R. M. Duwairi and I. Qarqaz, Arabic sentiment analysis
using supervised classification, in Proc. Int. Conf. Future
User reviews are limited to identifying polarity and Internet Things Cloud (Barcelona, Spain), Aug. 2014, pp.
subjectivity. However, the massive increase in review- 579–583.
based data implies a requirement to focus also on
performing predictions. This process is challenging yet [10] H. S. Le, T. V. Le, and T. V. Pham, Aspect analysis for
fruitful, as user reviews are qualitative while ratings are opinion mining of vietnamese text, in Proc. Int. Conf.
essentially quantitative. The numeric scoring of apps Adv. Comput. Applicat. (Ho Chi Minh, Vietnam), Nov.
within the Google App store could also be biased and 2015, pp. 118–123.
overrated because higher ratings given by users
potentially attract several new users disproportionately. [11] H. Wang, L. Yue, and C. Zhai, Latent aspect rating
This study therefore investigated the utilization of analysis on review text data: a rating regression
ensemble classifiers to predict numeric ratings for approach, in Proc. ACM SIGKDD Int. Conf. Knowledge
Google Play store apps supported the user reviews for Discovery Data Mining (Washington, D.C., USA), July
those apps. Several ensemble classifiers were 2010, pp. 783–792.
investigated to guage their performance on the reviews
[12] K. Dave, S. Lawrence, and D. M. Pennock, Mining the
scraped from the Google App store. Future work
peanut gallery: Opinion extraction and semantic
includes the implementation of the deep learning
classification of product reviews, in Proc. Int. Conf.
technique to predict numeric rating.
World Wide Web (New York, USA), 2003, pp. 519–528.
REFERENCES
[13] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up?: sentiment
[1] Statista, Number of available application in the Google classification using machine learning techniques, in
Play store from December 2009 to March 2019, Proc. ACL-02 Conf. Empirical Methods Natural Language
https://www.statista.com/ statistics/266210/number- Process. (Stroudsbrug, PA, USA), 2002, pp. 79–86.
of-available-applications-in-the-googl e-play-store/,
[14] C. Cardie et al., Combining low-level and summary
Online: accessed 22 May 2019.
representations of opinions for multi-perspective
[2] Statistaa, Number of mobile app downloads worldwide question answering, New directions in question
in 2017, 2018 and 2020 (in billions), answering, 2003, pp. 20–27.
https://www.statista.com/statistics/
[15] H. Takamura, T. Inui, and M. Okumura, Extracting
271644/worldwide-free-and-paid-mobile-app-store-
semantic orientations of words using spin model, in
downloads/, Online: accessed 22 May 2019.
Proc. Annu. Meeting Association Comput. Linguistics
[3] J. Horrigan, Online shopping, pew internet and American (Ann Arbor, MI, USA), 2005, pp. 133–140.
life project, Washington, DC, 2018,
[16] A. Buche, D. Chandak, and A. Zadgaonkar, Opinion
http://www.pewinternet.org/Repor ts/2008/Online-
mining and analysis: a survey, arXiv preprint
Shopping/01-Summary-of-Findings.aspx Online:
arXiv:1307.3336, 2013.
accessed 8 Aug. 2014.
[17] M. Suleman, A. Malik, and S. S. Hussain, Google play store
[4] D. Pagano and W. Maalej, User feedback in the appstore:
app ranking prediction using machine learning
an empirical study, in Proc. IEEE Int. Requirements Eng.
algorithm, Urdu News Headline, Text Classification by
Conf. (Rio de Janeiro, Brazil), July 2013, pp. 125–134.
Using Different Machine Learning Algorithms, 2019.
[5] T. Chumwatana, Using sentiment analysis technique for
[18] F. Sarro et al., Customer rating reactions can be
analyzing Thai customer satisfaction from social media,
predicted purely using app features, in Proc. IEEE Int.
2015.
Requirements Eng. Conf. (Banaf, Canada), Aug. 2018, pp.
[6] T. Thiviya et al., Mobile apps' feature extraction based on 76–87.
user reviews using machine learning, 2019.
[19] S. Aslam and I. Ashraf, Data mining algorithms and their
[7] H. Hanyang et al., Studying the consistency of star applications in education data mining, Int. J. Adv. Res.
ratings and reviews of popular free hybrid android and Computer Sci. Manag. Studies 2 (2014), no. 7, 50–56.
ios apps, Empirical Softw. Eng. 24 (2019), no. 7, 7–32.
[20] D. Martens and T. Johann, On the emotion of users in app
[8] N. Kumari and S. Narayan Singh, Sentiment analysis on reviews, in Proc. IEEE/ACM Int. Workshop Emotion
e-commerce application by using opinion mining, in Awareness Softw. Eng. (Buenos Aires, Argentina), May

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 273
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

2017, pp. 8–14.

[21] G. Hackeling, Mastering machine learning with scikit-

learn, Packt Publishing Ltd, 2017.

[22] Scikit learn, Scikit-learn classification and regression

models,
http://scikitlearn.org/stable/supervised_learning.html#
supervised -learning/, 10 Apr. 2019

BIOGRAPHIES

S SHASHANK
Student
ICFAI Tech Hyderabad

BRAHMA NAIDU
Professor
ICFAI Tech Hyderabad

Mobile App Success Prediction
No ratings yet
Mobile App Success Prediction
8 pages
Google Play Download Behavior Analysis
No ratings yet
Google Play Download Behavior Analysis
11 pages
Reliability Analysis of Mobile Application Based On The User Reviews of Health and Fitness
No ratings yet
Reliability Analysis of Mobile Application Based On The User Reviews of Health and Fitness
8 pages
Personalized App Service System Algorithm For Effective Classification of Mobile Applications
No ratings yet
Personalized App Service System Algorithm For Effective Classification of Mobile Applications
4 pages
User Reviews of Top Mobile Apps in Apple and Google App Stores
No ratings yet
User Reviews of Top Mobile Apps in Apple and Google App Stores
7 pages
Analyzing App Releasing and The Updating Behavior of Android Apps Developers
No ratings yet
Analyzing App Releasing and The Updating Behavior of Android Apps Developers
6 pages
A Study On Mobile Application
No ratings yet
A Study On Mobile Application
51 pages
Google Play Store Sentiment Analysis
No ratings yet
Google Play Store Sentiment Analysis
4 pages
BC2406 S01 G02 Final Report
No ratings yet
BC2406 S01 G02 Final Report
33 pages
Final Report
No ratings yet
Final Report
24 pages
Google Play Store Apps Data Analysis
No ratings yet
Google Play Store Apps Data Analysis
24 pages
Mobile APPs Design Group Presentation With Reference
No ratings yet
Mobile APPs Design Group Presentation With Reference
13 pages
Rating Decision Analysis Based On Ios App Store Data: DOI: 10.12776/QIP.V18I2.337
No ratings yet
Rating Decision Analysis Based On Ios App Store Data: DOI: 10.12776/QIP.V18I2.337
11 pages
Prediction of Application Usage On Smartphones Via Deep Learning
No ratings yet
Prediction of Application Usage On Smartphones Via Deep Learning
9 pages
Published Paper
No ratings yet
Published Paper
22 pages
RP05 - Ghose - Han - MS2014-Estimating Demand For Mobile Applications in The New Economy
No ratings yet
RP05 - Ghose - Han - MS2014-Estimating Demand For Mobile Applications in The New Economy
19 pages
Mobile Applications Ecosystem Final
No ratings yet
Mobile Applications Ecosystem Final
10 pages
Mobile Software: Emil Ahmadov
No ratings yet
Mobile Software: Emil Ahmadov
20 pages
Mobile App
No ratings yet
Mobile App
5 pages
"Student Careers": A Project Report On
No ratings yet
"Student Careers": A Project Report On
147 pages
Lesson 1 The Basics of Application Development
No ratings yet
Lesson 1 The Basics of Application Development
6 pages
Mobile App Development Insights
No ratings yet
Mobile App Development Insights
23 pages
Mobile App Dev: A Beginner's Guide
No ratings yet
Mobile App Dev: A Beginner's Guide
51 pages
07 Chapter01
No ratings yet
07 Chapter01
113 pages
Apps Downloadd
No ratings yet
Apps Downloadd
15 pages
Project
No ratings yet
Project
27 pages
Classification of Google Play Application Using Decision Tree
No ratings yet
Classification of Google Play Application Using Decision Tree
10 pages
Playstore - 6010 (1) (27) - Pages
No ratings yet
Playstore - 6010 (1) (27) - Pages
62 pages
Mobile App Analytics
100% (1)
Mobile App Analytics
58 pages
Wegilant Report
No ratings yet
Wegilant Report
33 pages
Success of A Mobile App: April 2021
No ratings yet
Success of A Mobile App: April 2021
6 pages
Liu 2014
No ratings yet
Liu 2014
31 pages
Mining Smartphone Data2017
No ratings yet
Mining Smartphone Data2017
61 pages
Mobile Application Development
No ratings yet
Mobile Application Development
4 pages
Mobile App
0% (1)
Mobile App
13 pages
Mobile Apps
No ratings yet
Mobile Apps
10 pages
The Futureof AIin Mobile App Developmentliteraturereview
No ratings yet
The Futureof AIin Mobile App Developmentliteraturereview
10 pages
App Price Prediction with ML
100% (1)
App Price Prediction with ML
3 pages
A Framework For App Store Optimization: Abstract
No ratings yet
A Framework For App Store Optimization: Abstract
10 pages
Publishing, Distributing, Monetizing, and Promoting Applications
No ratings yet
Publishing, Distributing, Monetizing, and Promoting Applications
9 pages
Mobile Application Market PDF
No ratings yet
Mobile Application Market PDF
4 pages
App Design for Aspiring Developers
No ratings yet
App Design for Aspiring Developers
10 pages
Mobile Application and Useful Gadgets
No ratings yet
Mobile Application and Useful Gadgets
17 pages
Revista Edicion Especial
No ratings yet
Revista Edicion Especial
22 pages
MAD UNIT 1 New
No ratings yet
MAD UNIT 1 New
12 pages
Mobile Application Development: All The Steps and Guidelines For Successful Creation of Mobile App: Case Study
No ratings yet
Mobile Application Development: All The Steps and Guidelines For Successful Creation of Mobile App: Case Study
6 pages
A Large Scale Empirical Study On Software Reuse in Mobile Apps
No ratings yet
A Large Scale Empirical Study On Software Reuse in Mobile Apps
13 pages
Google Play Store Analysis
No ratings yet
Google Play Store Analysis
3 pages
Introduction Mobile Apps Programming
No ratings yet
Introduction Mobile Apps Programming
27 pages
Major Mobile Companies and Market Statistics
No ratings yet
Major Mobile Companies and Market Statistics
5 pages
Make Money From Your Mobile Apps White Paper
No ratings yet
Make Money From Your Mobile Apps White Paper
16 pages
Fundamentals of Mobile App Development Technology PDF
No ratings yet
Fundamentals of Mobile App Development Technology PDF
12 pages
Mobile App Development Insights
No ratings yet
Mobile App Development Insights
17 pages
Lecture 3 Mobile Application
No ratings yet
Lecture 3 Mobile Application
13 pages
Bango Mobile Apps Whitepaper
No ratings yet
Bango Mobile Apps Whitepaper
14 pages
Unit I
No ratings yet
Unit I
31 pages
Adstock Transformation Analysis
No ratings yet
Adstock Transformation Analysis
8 pages
Soal Korelasi Dan Regresi Linier Sederhana
No ratings yet
Soal Korelasi Dan Regresi Linier Sederhana
11 pages
Cash Flow and Capital Investment in Textile Industry
No ratings yet
Cash Flow and Capital Investment in Textile Industry
76 pages
Journal of Travel & Tourism Marketing
No ratings yet
Journal of Travel & Tourism Marketing
15 pages
Practical Research
No ratings yet
Practical Research
25 pages
NO. X X X FEV: Adjusted R Square Std. Error of The Estimate 1 .036 .001 - .070 .9143 A. Predictors: (Constant), X1
No ratings yet
NO. X X X FEV: Adjusted R Square Std. Error of The Estimate 1 .036 .001 - .070 .9143 A. Predictors: (Constant), X1
8 pages
Unit 1
No ratings yet
Unit 1
21 pages
IIT Kanpur Probability Course
No ratings yet
IIT Kanpur Probability Course
13 pages
CSR's Impact on Employee Commitment
No ratings yet
CSR's Impact on Employee Commitment
16 pages
Managerial - Compensation - Based - On - Organizational - Performance - Pearce - Stevenson - & - Perry - AMJ - 1985 Ragu Bener Dak'
No ratings yet
Managerial - Compensation - Based - On - Organizational - Performance - Pearce - Stevenson - & - Perry - AMJ - 1985 Ragu Bener Dak'
21 pages
BRM Multi Var
No ratings yet
BRM Multi Var
38 pages
Research Methods and Data Analysis For Business Decisions: A Primer Using SPSS 1st Edition James E. Sallis PDF Download
100% (1)
Research Methods and Data Analysis For Business Decisions: A Primer Using SPSS 1st Edition James E. Sallis PDF Download
115 pages
Odisha Tourism PDF
No ratings yet
Odisha Tourism PDF
14 pages
Exponential Smmothing
No ratings yet
Exponential Smmothing
30 pages
DABE Module2 Practice
No ratings yet
DABE Module2 Practice
5 pages
Auditor Ethics Impact on Audit Quality
No ratings yet
Auditor Ethics Impact on Audit Quality
9 pages
Lassek & Gaulin - Muscle Mass IMP
No ratings yet
Lassek & Gaulin - Muscle Mass IMP
7 pages
Ivory Coast Trade Openness & Growth
No ratings yet
Ivory Coast Trade Openness & Growth
10 pages
46 (1) 2785
No ratings yet
46 (1) 2785
5 pages
Home Sales Price Prediction Guide
No ratings yet
Home Sales Price Prediction Guide
1 page
Practice Questions
No ratings yet
Practice Questions
5 pages
Regression Analysis Assumptions
No ratings yet
Regression Analysis Assumptions
19 pages
Tenant Satisfaction in Boarding House and Its Relationship To Renewal in Medan City, Indonesia
No ratings yet
Tenant Satisfaction in Boarding House and Its Relationship To Renewal in Medan City, Indonesia
7 pages
Discriminant Analysis in SAS Guide
No ratings yet
Discriminant Analysis in SAS Guide
53 pages
FINA3030 Group Assignment 2022mar
No ratings yet
FINA3030 Group Assignment 2022mar
6 pages
Test Code: STB (Short Answer Type) 2015
No ratings yet
Test Code: STB (Short Answer Type) 2015
3 pages
OBE Syllabus: Mathematics in Modern World
No ratings yet
OBE Syllabus: Mathematics in Modern World
9 pages
Hasil Output Regresi Sederhana Dan Berganda
No ratings yet
Hasil Output Regresi Sederhana Dan Berganda
37 pages
Itae002 Test 2
No ratings yet
Itae002 Test 2
150 pages
Common Errors in Statistics PDF
No ratings yet
Common Errors in Statistics PDF
2 pages

Google Play Store Apps-Data Analysis and Ratings Prediction

Uploaded by

Google Play Store Apps-Data Analysis and Ratings Prediction

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 12 | Dec 2020 www.irjet.net p-ISSN: 2395-0072

Google Play Store Apps- Data Analysis and Ratings Prediction

- Other authors [15] suggested adopting a statistical

- Various sentiment analysis methods have been

However, the above studies are unsatisfactory in various

3. EXPLORATORY DATA ANALYSIS Fig -4: Updated Apps

3.1 Free vs Paid 3.3 Updated Free Apps

In this data almost 50% apps are added or updated on the

Fig -3: Free vs Paid

Fig -6: Updated Paid Apps

3.5 Ratings 3.8 Ratings of Free vs Paid Apps

Fig -10: Free vs Paid App Ratings

3.9 Free Apps Ratings

3.6 Free App content Rating

Fig -11: Free App Ratings

3.10 Paid App Ratings

3.7 Paid App Content Rating

Fig -12: Paid App Ratings

3.11 Category of Apps

3.14 Content Rating

Fig -13: Paid App Ratings

3.12 Android Version

Fig -14: Paid App Ratings

3.13 Number of installations

Fig -15: Paid App Ratings

3.17 Ratings over Installations 4.3 DATA PREPARATION

4.3.2 FEATURE SELECTION

Training a supervised machine learning algorithm requires

We Use Anaconda Navigator to launch Jupyter Notebook.

only numeric values are used so the importances of these

5.3 Linear Regression

Linear Regression model is also used to find the variable

5.4 K Nearest Neighbors

KNN is easiest supervised machine learning algorithm. It’s

5.5 K Means Clustering

K-Means Clustering Algorithm works by comparing or using

2017, pp. 8–14.

[21] G. Hackeling, Mastering machine learning with scikit-

[22] Scikit learn, Scikit-learn classification and regression

You might also like