E-Commerce Fraud Detection Review
E-Commerce Fraud Detection Review
Abstract: The e-commerce industry’s rapid growth, accelerated by the COVID-19 pandemic, has led to an
alarming increase in digital fraud and associated losses. To establish a healthy e-commerce ecosystem, robust
cyber security and anti-fraud measures are crucial. However, research on fraud detection systems has
struggled to keep pace due to limited real-world datasets. Advances in artificial intelligence, Machine Learning
(ML), and cloud computing have revitalized research and applications in this domain. While ML and data mining
techniques are popular in fraud detection, specific reviews focusing on their application in e-commerce
platforms like eBay and Facebook are lacking depth. Existing reviews provide broad overviews but fail to grasp
the intricacies of ML algorithms in the e-commerce context. To bridge this gap, our study conducts a systematic
literature review using the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA)
methodology. We aim to explore the effectiveness of these techniques in fraud detection within digital
marketplaces and the broader e-commerce landscape. Understanding the current state of the literature and
emerging trends is crucial given the rising fraud incidents and associated costs. Through our investigation, we
identify research opportunities and provide insights to industry stakeholders on key ML and data mining
techniques for combating e-commerce fraud. Our paper examines the research on these techniques as
published in the past decade. Employing the PRISMA approach, we conducted a content analysis of 101
publications, identifying research gaps, recent techniques, and highlighting the increasing utilization of artificial
neural networks in fraud detection within the industry.
Key words: E-commerce; fraud detection; Machine Learning (ML); systematic review; organized retail fraud
© The author(s) 2024. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
420 Big Data Mining and Analytics, June 2024, 7(2): 419−444
denial of service, phishing, malware, fraudulent algorithms. Typically, labeled data are used to train
e-commerce, romance scams, and tech support classifiers to distinguish between the two classes
scams[2]. Additionally, credit card theft, money (fraudulent and non-fraudulent). This implementation
laundering, and fraudulent financial transactions are feeds classifiers information from user profiles,
widespread in the digital age[2, 4]. These actions have a including transaction values, day of the week, item
negative impact on businesses and clients, posing category, age, gender, and geographic location. Those
serious risks to their finances, reputations, and mental who argue against statistical and computational
health. methods claim that these features are easy for
According to a recent analysis by Juniper Research, sophisticated fraudsters to fabricate[9]. Irani, Pu, and
losses related to online payments on e-commerce Webb[10, 11] believe that once fraudsters discover that
platforms are growing at a staggering rate of 18 percent authorities have picked up on their jargon, they can
annually[5]. This highlights the critical importance of avoid keyword traps by switching to new expressions.
studying this area to inform fraud detection or Network analysis is advocated by the second school of
prevention strategies to slow down the upward trend. thought as an alternative approach to creating fraud
Frequently, current strategies are unable to keep up detection features[9, 12]. In order to derive graph-
with fraudsters, who are constantly adapting and theoretical variables or scores that specifically
changing their methods to exploit the platforms[6]. characterize nodes of fraud, the concept makes use of
What is more, low research and development efforts the connectedness between the nodes, which are often
fueled by a lack of practical data and the need for users or items in a dataset. The theory underlying
businesses to protect their platform vulnerabilities identification strategies is that abnormal users display
further exacerbate the issue. For example, it makes no connection patterns that are different from those of
sense to describe fraud detection or prevention methods typical users[9]. In our review, we focus on the first
in the open since doing so would arm fraudsters with school of thought.
the knowledge they need to avoid detection[1]. E-commerce platforms have intricate design
In literature, addressing fraud of any kind can take architectures and multiple points of vulnerability
two forms: (1) Prevention, which refers to steps taken (explored later in Section 4), which fraudsters and
to avert the occurrence of the acts in the first place. attackers could use against them. In Figs. 1 and 2, we
This includes intricate designs, personal identity illustrate a commonly used e-commerce/marketplace
numbers, internet security for online interactions with architecture in the industry to illustrate the complexity
digital platforms, and passwords and authentication of these platforms. At a high level, an e-commerce
mechanisms for computers and mobile devices[7]. platform comprises three layers, as shown in Fig. 1. (1)
Prevention techniques are not perfect; frequently, a The presentation layer, which is the part that is
trade-off between cost (for the business) and presented to the customer. It is the user interface and
discomfort (for the customer) must be made. (2) On the communication part of the architecture, where the
other hand, detection entails recognizing fraudulent customer interacts with the website on the front end
acts as soon as they occur[7]. When prevention fails, and the application collects data and processes requests
detection becomes material. For example, we can on the back end; (2) The business layer, also known as
prevent credit card fraud by protecting our cards the application or service layer, uses business logic, a
insidiously, but if the card information is stolen, we specific set of business rules, to gather and process
must notice the fraud as soon as possible[8]. Since information. It can also delete, add, or change
neither form above is perfect in reducing the risks and information in the data layer; (3) The data layer, which
effects of fraud, production systems often consider a is also known as the database layer, is the final layer
combination of the two to combat fraud. In this review, and is used for storing data and processing requests. In
we limit our focus to detection systems. light of this complex design, we posit that the statistical
There are two schools of thought regarding fraud and computational approach (application of ML and
detection systems. The first is in favor of statistical and data mining techniques) is best suited for combating
computational methods, and researchers in this area fraud on these platforms. Figure 2 not only shows the
include Refs. [6−8]. To identify fraud, this way of detailed connections between the tiers presented in
thinking applies statistical tools, including ML Fig. 1, but also includes third-party connections that
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 421
Client Database
Server
E-commerce platform
offer ancillary services on the e-commerce platform. In this work, we acknowledge these gaps and
propose a systematic literature review using the
1.2 Problem statement
Preferred Reporting Items for Systematic reviews and
Machine learning and data mining techniques have Meta-Analysis (PRISMA) methodology[21] to examine
become popular in fraud detection across many the use and application of machine learning and data
domains[13], partly explained by the rapid development mining techniques for fraud detection on digital
of artificial intelligence and the availability of marketplaces or in the e-commerce domain. This is a
affordable cloud computing technology. A review crucial area given the soaring trends in fraud incidents
specifically concentrating on the use of these methods and their associated costs[22]. Understanding the current
on e-commerce platforms like eBay and Facebook has literature and trends is essential to identifying new
not been published, though. What we observe is that research opportunities as well as informing the industry
past reviews frequently use a broad brush to describe on the main machine learning and data mining
all methodologies and domains, for example, reviews techniques for fraud detection in this area.
by Refs. [6, 14]. Such high-level coverage fails to To accomplish this work, we answer four research
produce a nuanced understanding of machine learning questions as described in Section 3.1. From our
algorithms and their applications in the e-commerce methodology and corpus, we contribute to the state-of-
domain. the-art as follows:
On the other hand, the majority of the specific fraud (1) Provide an array of machine learning and data
literature reviews, like: Refs. [15–19] only cover the mining techniques used for fraud detection on digital
financial domain, such as credit card fraud. What is marketplaces or e-commerce domains in the last
more, a large number of these articles do not employ decade.
systematic literature review methodology to support (2) Highlight gaps, trends, and future research
replication[20]. directions on the application of machine learning and
422 Big Data Mining and Analytics, June 2024, 7(2): 419−444
data mining techniques for fraud detection in the digital A few reviews of a particular domain are also
marketplace or e-commerce domain. included in the literature. Adewumi and Akinyelu[25]
The remainder of this paper is organized as follows: used the Kitchenham approach to conduct a systematic
In Section 2, we present related research. In Section 3, review of the financial fraud field between 2010 and
we present the PRISMA methodology used to compile 2021. Their focus is on the use of machine learning
research articles from the literature. Our literature techniques in the detection of financial fraud. Ahmed
corpus is examined in Section 4 in light of the study’s et al.’s[18] review of anomaly detection methods for
research questions. In Section 5, we go over the key fraud detection is yet another review in the financial
findings and open issues. Finally, we reach a domain.
conclusion in Section 6. The type of fraud that receives the most reviews is
credit card fraud. Reviewing credit card fraud,
2 Related Work highlighting misuses of supervised and unsupervised
Reviews of general fraud detection have recently been techniques, and offering advice for new researchers are
written and published in the literature. A general among Sorournejad. et al.’s[26] highlights.
review of articles on automated detection techniques Techniques for data mining are the focus of another
(supervised, unsupervised, and hybrid) from the group of reviews. For instance, Pourhabibi et al.[15]
previous ten years is published by Unam et al.[23]. The explored the interdependency between various data
authors of that review formalize the major fraud types objects with a focus on graph-based anomaly detection.
and subtypes for a wide range of industries while Reviewing data mining techniques with an emphasis on
presenting alternative information and solutions for machine learning classification methods, Aziz and
each. Amir and Hamid[24] conducted yet another Ghous[27] provided another review in this area.
general review of articles related to fraud detection. In Table 1, we provide a list of the articles we
The researchers outline five common fraud types, consider related to our work. We develop this list by
including credit card fraud, telecom fraud, fraud instantiating our search based on three well-known
involving health insurance, fraud involving auto articles in this fraud domain[6, 8, 37] and snowballing to
insurance, and fraud involving online auctions. Their similar articles. We prioritize the list on the basis that
work does not employ a systematic review an article covers fraud in e-commerce or a related
methodology, and the review period is from 1994 to domain.
2014. According to the results, there are no studies that
Table 1 Related work.
Article Year Coverage Review type Domain
[23] 2010 2000−2010 Unknown General fraud
[28] 2016 − Unknown Online fraud
[24] 2016 1994−2014 Unknown General fraud
[18] 2016 − Unknown Financial fraud
[26] 2016 − Unknown Credit card fraud
[17] 2016 1997−2016 Unknown Credit card fraud using nature inspired machine learning
[29] 2017 − Systematic literature review Credit card fraud using ML
[30] 2018 − Unknown General fraud using ML
[30, 31] 2018 − Unknown Credit card fraud in e-commerce
[14] 2020 − Systematic literature review General fraud with graph-based anomaly detection
[32] 2021 − Unknown Credit card fraud with ML
[33] 2021 − Systematic literature review E-commerce
[34] 2021 − Unknown E-commerce
[35] 2021 − Systematic literature review E-commerce fake reviews
[36] 2021 − Unknown Credit card fraud
[20] 2022 − Systematic literature review e-commerce (detection and prevention)
[13] 2022 − Systematic literature review Financial fraud (Machine learning)
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 423
concentrate on fraud detection using machine learning guidelines from Petticrew and Roberts[38] on how to
or data mining techniques on digital marketplaces or e- scope Systematic Literature Review (SLR), avoid
commerce platforms. In the few places where e- possible biases, and synthesize the results.
commerce is mentioned in the domain column, the
3.1 Research questions
focus is on common fraud types, and little to no
attention is paid to the fraud detection methods used. Understanding the literature on the use of machine
Additionally, the majority of the surveys do not apply a learning and data mining techniques for fraud detection
systematic literature review methodology. Our study’s on e-commerce or digital marketplace platforms is the
main goal is to fill in these gaps. primary goal of this research. Our Research Question
three (RQ3) ultimately encapsulates this, but in order to
3 Research Method accomplish this successfully, we first use Research
We adopt the PRISMA approach[21] to search and Questions one and two (RQ1 and RQ2) to establish the
select articles in the scope of fraud detection in e- context. These inquiries help us understand the design
commerce or digital marketplaces based on machine architecture of e-commerce platforms and
learning or data mining techniques. The PRISMA contextualize major vulnerabilities discovered therein
approach generates high-quality results and supports as well as related frauds. Finding research gaps, trends,
reproducibility. It is structured in a manner that allows and opportunities for further research in the field is the
the identification and summarization of problems goal of our last research question. Below, we list our
(domains), techniques, and methods used to solve the research questions.
problem. The implementation of this approach follows • RQ1: What are the common vulnerabilities on
a checklist of title, abstract, introduction, methods, e-commerce platforms?
results, discussion, and funding. In this structure, the • RQ2: What are the common frauds in the
title and abstract are constructed to achieve comparable marketplace or e-commerce domain?
objectives to any other approach, but the introduction • RQ3: What are the commonly used machine
must provide the rationale for the review and the learning and data mining techniques for fraud detection
questions to be addressed. Study characteristics, on digital marketplaces or e-commerce platforms, and
information sources, search strategy, including limits, what does good performance of these techniques look
statement process for selected studies, eligibility like?
criteria, data collection, and data items are specified in • RQ4: What are the research gaps, trends, and
the methods section[21]. The discussion involves a opportunities for future research in this area?
summary of the findings, a discussion of the
3.2 Data and search strategy
limitations, and a general conclusion of the results and
future work. By extracting potential search terms from the titles,
Systematic reviews give researchers and abstracts, and subject indexing of three pertinent
practitioners, who would otherwise be overwhelmed by publications[17, 23, 24], we develop an initial search
the volume of research on a given topic, a rigorous strategy. We use its results to expand the list of key
mechanism upon which to base their decisions. There words and restrict it to only English-language articles
is a wide variety of literature review approaches for all in order to further hone this strategy. We then test the
kinds of topics and disciplines. In our approach, we validity of our search strategy by checking whether it
take the following steps: (1) topic definition; (2) could retrieve the three known relevant studies and two
research question formulation; (3) keyword more studies referenced in Ref. [17]. All the five
identification; (4) identification and search of studies are successfully identified by the strategy. A
electronic paper repositories; (5) publication group of peer reviewers approves the final search
assessment; (6) data acquisition and cleaning; (7)−(9) strategy.
testing and revising publication; (10) production and Using an iterative search approach, we look for
revision of summary tables and figures; (11) draft publications within our search period (2010−2023) that
methods; (12) and (13) evaluation and draft of key have the following keywords in their title or abstract:
results; (14) introduction draft, abstract, and references; e-commerce, fraud detection, machine learning,
(15) paper revision. During the initial stages, we apply systematic review, organized retail fraud, data mining,
424 Big Data Mining and Analytics, June 2024, 7(2): 419−444
and digital marketplace. We display the iterative results. Additionally, many of its articles are subpar
approach in the workflow diagram shown in Fig. 3. To and out of date (Falagas et al.[40]). Therefore, it helps to
reduce the amount of noise in the results, our search think of a way to minimize noise and duplicates in the
strategy employs the search logics “AND”, “OR”, combined search results. To this end, we apply the
“LIMIT TO”, and “EXCLUDE”. inclusion and exclusion criteria defined in Table 2 to
meet that need.
3.3 Publications Repositories
We focus our search on three international digital 3.4 PRISMA flow diagram
repositories: Scopus, Web of Science (WoS), and We use the flow diagram shown in Fig. 4 to illustrate
Google Scholar, which together hold the majority of how we apply our inclusion and exclusion criteria to
global scientific research. The initial search query in narrow down the most relevant articles for our
each repository yields a wide range of publications in a literature search.
multidisciplinary setting covering, among other things, Three hundred and sixty-six articles total in the
computer science, engineering, decision science, combined search results are reduced to three hundred
mathematics, energy, physics, and astronomy. We and thirty-five after duplicates are eliminated. The first
approach our search with the knowledge that the step of our exclusion criteria is when EF1 eliminates
coverage, accuracy, and access fees of these digital three papers written in a language other than English.
repositories vary. For instance, Scopus and Web of Our exclusion criterion, EF2, eliminates twenty-six
Science overlap in two out of three instances[39], with papers in the second step that come from
Scopus offering 20 percent more coverage than Web of interdisciplinary fields like medicine. EF3 and EF4
Science[40]. Depending on the search terms, Google eliminate a combined total of two hundred and nine
Scholar frequently provides inaccurate and inconsistent publications, leaving us with one hundred and one
Keyword identification
and search definition Clean data
Table 2 Inclusion and exclusion criteria used to denoise search results from the electronic repositories.
Inclusion Filter (IF) Exclusion Filter (EF)
IF1: Articles within the study period (2010−2022) EF1: Articles not published in English
IF2: Articles that focus on fraud detection on fraud
EF2: Articles in unrelated disciplines, e.g., medicine
detection on e-commerce platforms
EF3: Articles that are in the form of lecture notes, short papers, posters and book
IF3: Peer reviewed articles
chapters, thesis or dissertations, reviews, and survey articles
IF4: Journal and conference papers EF4: Articles that do not focus on fraud detection, ML, data mining on e-commerce
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 425
Scopus 138
Google Scholar 28
Fig. 4 PRISMA flow diagram showing detailed filtering levels from a high-level representation of publications from initial
search query to a final set of publications for SLR analysis.
papers for our final corpus. ranking, volume, issue, article number, cited by, DOI,
Figure 5 displays the total number of publications link, affiliations, country affiliation, authors with
over the years of our study period, broken down by affiliations, abstract, author keywords, index keywords,
article type, conference paper, and journal article. tradenames, manufacturers, references, correspondence,
Between 2010 and 2018, there were very few articles address, editors, sponsors, publisher, conference name,
published on the topic of e-commerce fraud detection conference date, conference location, conference code,
using machine learning and data mining techniques. ISSN, ISBN CODEN, PubMed ID, language of
However, 2019 and later years see more articles original document, abbreviated source title, document
published with an almost equal split between the two type, publication stage, open access, source EID. We
document types, except for 2020, where the number of perform our analysis using VOSviewer[41] as the
conference articles is more than double that of journal analysis software and the CSV as the input. This tool
articles.
enables the creation of bibliometric networks of
3.5 Bibliographic analysis context scientific publications, authors, institutions, and
For our exploratory work, we use a bibliometric keywords. Co-authorship, co-occurrence, citations,
analysis approach to identify the authors of the bibliometric coupling, or co-citation links are used to
research articles, their citations, geographic connect the items in these networks[42]. In our analysis,
breakdown, and high-level content of their articles. In we identify a number of network properties, including
the end, this exercise aids in our continued refinement clusters and node centrality. These analyses highlight
of the articles we choose to use to answer our research recurring themes in the publications that serve as the
questions. basis for our state-of-the-art analysis and discussion.
3.5.1 Bibliometric analysis 3.5.2 Constructing bibliometric networks
From the combined search results, we create a CSV file Building bibliometric networks can be done in a
that includes the following fields: authors, author(s) ID, variety of ways, but in this case, we concentrate on two
title, year, source title, publisher, country, field, methods: full counting and fractional counting[42]. The
16
Journal
14 Conference
12
Number of articles
10
8
6
4
2
0
2010 2011 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year of publication
Fig. 5 Bar graph showing the number of publications focusing on e-commerce fraud detection using machine learning and
data mining techniques for each year within our search range (2010−2022).
426 Big Data Mining and Analytics, June 2024, 7(2): 419−444
article mentioned above provides a detailed analysis of VOSviewer software viewer, natural language
the two approaches, including mathematical processing algorithms take over to identify and select
formulation, but the following co-authorship network terms based on the following steps: (1) removal of
example highlights the key distinctions between the copyright statements; (2) sentence detection; (3) part-
two examples quickly. Take four authors (R1, R2, R3, of-speech tagging; (4) noun phrase identification; and
and R4), and three documents (P1, P2, and P3), as (5) noun phrase unification. The results emitted by the
shown in Fig. 6a. P1 is authored by R1, R2, and R3, P2 algorithms above yield noun phrases identified from
is authored by R1 and R3, and P3 is authored by R2 the titles and abstracts of the publications used. Phrases
and R4. As shown in Fig. 6b, the networks created are selected from this list by setting certain preferences,
using full and fractional counting can be visualized. such as the minimum number of occurrences and
The assignment of the strength of the links is the relevance score and excluding specific terms that do
primary distinction between the two strategies. not add new information to thin the overall phrase
In the full counting network, the link between R1 and population to only what is important[41]. In our case, we
R3 has a strength of 2, indicating that authors 1 and 3 use the fractional counting approach, which gives equal
collaborated on the creation of publications P1 and P2. weight to all units, as recommended by Ref. [42].
The associated authors of the other links have
3.6 Bibliometric network analysis results
co-authored one publication and have a strength of 1.
Fractional counting is used to lessen the impact of The example above was for co-authorship networks,
publications with numerous co-authors. The total but the same idea can be used for bibliometric coupling
number of authors of each co-authored publication, as networks (with documents, sources, authors,
well as the number of documents each author has co- organizations, and countries as units of analysis), co-
authored, determines the strength of the co-authorship citation networks (with cited references, cited sources,
link in fractional counting between two authors. This and cited authors as units of analysis), keyword co-
logic results in a link strength of 1/n for each co- occurrence networks (with author keywords and index
authorship link in the scenario where an author co- keywords as units of analysis), and citation networks
authored a work with n other authors. The strength of (with documents, sources, authors, organizations, and
the n co-authorship links as a whole is then equal to 1. countries as units of analysis). We provide co-
This is distinct from the full counting case, where each authorship, co-citation, and keyword co-occurrence
of the n-co-authorship links has a total strength of n[41]. network results below.
The aforementioned illustration, which was taken from 3.6.1 Co-authorship networks
Ref. [42], applies to instances of keyword co- As our two units of analysis, we select the country and
occurrence, bibliographic coupling, and co-citation the researcher. We also set the minimum number of
links. The units of analysis could be researchers, documents co-authored between two countries to two
research institutions, countries, and journals. and the minimum number of documents by an author to
When the final SLR data is passed on to the two as well. In both cases, we ignore documents co-
R1
P1 R3 R3
2 1 1.5 0.5
R2
P2
1 1 0.5 0.5
R3 R1 R2 R4 R1 R2 R4
P3 Full counting Fractional counting
Authorship link
R4
(a) (b)
Fig. 6 (a) Authorship links and (b) counting techniques for constructing authorship networks.
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 427
authored by more than 25 countries or researchers. We degree to which they are cited in the same publications.
find that five out of 25 countries and four out of 254 The more often researchers are cited in the same
authors meet these thresholds. For each of the five publications, the stronger their relatedness. We conduct
countries and four authors, we calculate the total co-citation networks for the cited reference, cited
strength of the co-authorship links. China and the source, and cited authors. Setting the minimum number
United States have strong co-authorship links, while of citations for a cited reference to 3, we find that 87 of
researchers Carta S. and Saia R. have the strongest co- the cited references meet this threshold. We calculate
authorship links. The details of our co-authorship the total link strength for each of the 87 references and
networks based on country and researcher are select those with the greatest link strength, as shown in
summarized in Tables 3 and 4, respectively. Table 6. Articles written by Refs. [43–46] show strong
In summary, our co-authorship networks portray that, linkages in our co-citation networks.
in this domain, collaboration between researchers and 3.6.3 Keywords co-occurrence networks
across countries is low. This result could perhaps be A crucial puzzle piece is the co-occurrence of
explained by the sensitivity surrounding fraud data. keywords. It helps us understand the research themes
In Table 5, we observe that India leads in the of our search results as well as confirms the accuracy
authorship of research articles in this domain. China of our search criteria. For our context, we analyze the
ranks second and the USA, Italy and Iran hold the third data using two units: all keywords and the author’s
place in article authorship. keywords. 34 out of 593 keywords meet the criteria
3.6.2 Co-citation networks when the minimum number of instances for all
In a co-citation network analysis of researchers, the keywords is set to 5. We determine the overall strength
relatedness of researchers is determined based on the of the co-occurrence links between each of the 34
keywords and choose the ones that have the strongest
Table 3 Selection top countries in terms of document co-
authorship.
links. Similar steps are taken for the author’s
Total link Number of Number of keywords, and we discover that thirteen out of 251
Country
strength documents citations keywords meet the necessary threshold and that seven
China 3 10 166 out of the 13 have excellent link strength. We
USA 3 4 13 summarize the results of the keyword co-occurrence
India 0 23 98 networks in Tables 7 and 8 below.
Indonesia 0 4 14 The most prominent topic in this area is credit card
Italy 0 4 268 fraud detection, and the most widely used machine
learning techniques are decision trees, random forests,
Table 4 Selection of the top authors in terms of
and logistic regression. The frequency of keyword co-
co-authorship of documents.
occurrences in our corpus is demonstrated in Fig. 7
Total link Number of Number of
Researcher below.
strength documents citations
Carta Salvatore 2 2 48
4 Detailed Analysis and Results from
Saia Roberto 2 3 52 Corpus
Kawase Ricardo 0 2 3
Li Zou 0 3 142 To address each research question posed in Section 3.1,
we present the findings of our analysis of the literature
Table 5 Top ten countries by authorship corpus in this section.
Number of Number of
Country Country 4.1 RQ1: What are the common vulnerability
articles articles
India 30 UK 3 areas in the marketplace or e-commerce
China 17 Indonesia 3 domain?
Kingdom of Our corpus surfaces key vulnerabilities on e-commerce
USA 5 3
Saudi Arabia
platforms, as shown in the architecture diagram for e-
Italy 5 South Africa 2
commerce systems in Fig. 8. We describe each one of
Iran 5 Russia 2
them in subsequent subsections.
428 Big Data Mining and Analytics, June 2024, 7(2): 419−444
Table 6 Topmost cited articles ranked on total link strength and illustration of the methods and domains covered in the
articles.
Cited Number of Total link
Title Method Fraud domain
reference citations strength
A blockchain, smart contract and data mining
[43] based approach toward the betterment of Rule-based methods Phishing 216 18
e-commerce
Artificial Neural Networks
A hybrid machine learning framework for
[44] (ANNs), decision trees, and copula Bank/payments 120 15
e-commerce fraud detection
models
A machine learning based credit card fraud Genetic algorithm, decision trees,
[45] detection using the GA algorithm for feature random forest, Naïve Bayes, and Credit card 80 13
selection logistic regression
A proposed fraud detection model based on
[46] e-payments attributes a case study in Egyptian Decision tress E-payments 79 13
e-payment gateway
A study on fraud detection in the C2C used trade Natural language processing
[47] E-payments 68 10
market using Doc2vec (Doc2Vec) and random forest
Account takeover detection on e-commerce Account
[48] ANN 63 8
platforms take-over
An analysis on fraud detection in credit card Decision trees, random forest,
[49] Credit card 59 7
transactions using machine learning techniques KNN, and logistic regression
An innovative sensing machine learning
[50] technique to detect credit card frauds in wireless Support Vector Machine (SVM) Credit card 54 6
communications
Analysis of supervised machine learning SVM, logistic regression, and
[51] Credit card 51 5
algorithms in the context of fraud detection imbalanced learning
Table 7 Top keywords of all keywords. the user has received authentication, they can access
Total link Number of the service. The authentication certificate’s issuance
Keyword
strength occurrences procedure is managed by an Identity Management
Crime 45 45
Service (IMS). It is possible to generate a duplicate
Fraud detection 43 52
certificate or forge one and issue it to the web server
Machine learning 37 37
Electronic commerce 21 21
instead of the original user, bypassing identity
Decision trees 20 20 authentication to grant access to the service. Attackers
Credit card fraud or fraudsters could take advantage of this vulnerability
19 19
detection to place unauthorized orders for goods or make
unauthorized purchases.
Table 8 Top author’s keywords. 4.1.2 Unsecure protocol
Total link Number of “Man-in-the-middle” attacks, in which attackers or
Author’s keyword
strength occurrences
fraudsters establish a connection with message
Fraud detection 29 44
transmitters and receivers, could take advantage of this
Machine learning 22 24
E-commerce 11 14
vulnerability. In this instance, attackers create the
Credit card fraud 9 9 impression that the sender and the receiver are
Classification 8 9 speaking directly to one another by relaying messages
Logistic regression 8 8 between them. The assailant can easily encode the
Random forest 6 6 communication and use it to commit heinous fraud.
4.1.3 No filters mentioned on the application level
4.1.1 Certificate duplicity An implementation of security code known as a filter is
Users must authenticate themselves by providing their used in web applications to intercept, examine, and
credentials to web hosts in order to use an e-commerce respond to requests made to those applications[52].
platform. In exchange, users are given an Without a filter at the application level, a hacker or
authentication credential as proof of certification. Once fraudster may be able to send malicious code through
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 429
Classification Credit card fraud detection
Fraud
Logistic regression
Machine
learning
Credit card
Decision tree
Data mining
Fig. 7 Network map of keywords co-occurrence showing the most common keywords in the corpus.
Certificate
duplicity
E-commerce platform
Unsecured
protocol
Denial of service
the web application and carry out actions such as cross- targeting e-commerce traffic by rendering services
site scripting and local or remote file inclusion, among unavailable to gullible consumers.
other things. Such actions could potentially lead to 4.1.5 Unsecure database
fraud.
The database is maintained on the same server in most
4.1.4 Denial of service
A vulnerability can be exploited to make the e-commerce models without passing through additional
e-commerce system unavailable to its intended users. security barriers. Such a flaw could be used by
This vulnerability can be leveraged by fraudsters fraudsters to insert malware into the database, cause
430 Big Data Mining and Analytics, June 2024, 7(2): 419−444
important data leaks, and launch a variety of fraud suggested. Our corpus reveals five types of fraud that
schemes. can be thwarted using machine learning and data
mining techniques. These include financial or payment
4.2 RQ2: What are the most common e-commerce
frauds, web application frauds, spam or phishing
frauds?
frauds, triangulation frauds, and bot frauds. Figure 9
On e-commerce platforms, fraudsters use demonstrates where these frauds take place on the
vulnerabilities known to them to wedge attacks and e-commerce platform, while Fig. 10 shows the share of
commit fraud. Once the weaknesses are clearly articles within our corpus addressing each fraud type.
understood, countermeasures can be created to lessen 4.2.1 Financial frauds or payment frauds
the risk of fraud and combat its effects. In this This type of fraud is the most prevalent on e-commerce
question, we use our corpus to highlight significant e- platforms and has existed since the beginning of
commerce frauds and the solutions researchers have businesses’ shift from physical to online locations.
Triangulation fraud
Web
application
fraud E-commerce platform
Bot fraud
Fig. 10 E-commerce frauds and percentage of articles from our corpus addressing each fraud type.
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 431
Using financial or payment information obtained recommendation problem, which is then solved using a
through the exploitation of the aforementioned ranking metric embedding-based method. This
vulnerabilities, fraudsters frequently carry out approach solves the data scarcity issue often
unauthorized transactions. In our work, we do not encountered in situations where historical fraudulent
address the architecture of the online payment process behaviors are nonexistent for certain individuals by
or the classification of the sub-fraud types under leveraging collaborative filtering techniques to create
financial frauds, but Ref. [20] provides a detailed similarity profiles between individuals. In summary,
illustration of the components related to e-commerce there are more than 63 articles covering this category.
payments. Our work provides a high-level illustration 4.2.2 Web application fraud
of platform frauds and vulnerabilities (see Figs. 8 and This is the second-largest (20 percent) type of fraud
9). According to our research, there are three main addressed by articles in our literature. Fraudsters in this
types of financial or payment fraud: online category exploit poorly developed e-commerce
transactions, bank payments, and credit card websites (front-end) to defraud unsuspecting shoppers.
transactions. With 87 percent of the articles in the Common fraud activities in this type of fraud include
corpus focusing on it, credit card fraud is the most fake transactions and gift card fraud[55]. In this
prevalent category. This is not surprising given that
category, researchers employ both machine learning
credit cards have become the most common form of
and data mining techniques for detection. Reference
payment used for shopping on e-commerce platforms.
[56] addresses the reduplication of accounts by users
Bank payment fraud comes in second with about 10
who seek to get more coupons or promotions
percent of the articles, and online transaction fraud is
fraudulently. This is a well-known type of abuse that
ranked as the third subcategory with 3 percent of the
bad consumers use and can lead to huge losses for
articles, see Fig. 11 for this breakdown. Due to the
companies as well as misleading user information. The
sheer volume of articles in this category, we will not
list them all, but a few stand out, such as Ref. [53], researchers in this study use data mining techniques
who suggests a machine learning-based credit fraud like J48 to detect promo misuse based on customer
detection engine using a genetic algorithm for feature profiles. Another study by Ref. [57] proposes an
selection. The authors use a data set generated from unsupervised learning method based on a finite mixture
European market card holders to test the performance model to identify pricing frauds on e-commerce web
of their engine. A study by Ref. [54] proposes a deep sites. A final study worth mentioning is by Ref. [58].
learning-based algorithm for credit card fraud detection These researchers focus on detecting fictitious account
dubbed Multi-Class Neural Network (MCNN). This registrations using Long Short-Term Memory (LSTM)
method incorporates a class rebalancing mechanism to and applying Synthetic Minority OverSampling
deal with the class imbalance problem that often Technique (SMOTE) and adaptive synthetic sampling
appears in fraud data sets. Another study in this domain (ADASYN) for class imbalance treatment. The
addresses bank payment fraud by taking the initial remaining articles in this category are shown in
detection problem and transforming it into a pseudo- Table 9.
35
Methods used by paper:
30 Logistic regression
Number of fraud articles
Decision tree
25 Random forest
Naïve Bayes
20 SVM
15 ANN
K-nearest neighbor
10 Boosting algorithms
Others
5
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year
Fig. 11 Evolution of use machine learning and data mining techniques for e-commerce fraud detection over the years.
432 Big Data Mining and Analytics, June 2024, 7(2): 419−444
experiments, comparing their proposed solution with reveal the essential difference between bots and human
current methods on a test data set containing 900 traffic. Their approach achieves 93 percent precision in
websites. They determine that the SLT method can the experimental setting but also performs equally well
more accurately detect fake websites by utilizing a in a real-world data setting and proves robust for
richer set of fraud cues in combination with domain- detecting unknown cloud bots as well as addressing the
specific knowledge. concept of drift caused by varying time.
4.2.5 Bot fraud
4.3 RQ3: What are the commonly used machine
Fraudsters are constantly evolving their methods to
learning and data mining techniques for
outsmart fraud detection systems. Bots can be used by
fraud detection on digital marketplaces or
fraudsters to defraud e-commerce enterprises. For
e-commerce platforms, and what does good
example, bots can be used to mimic the behavior of
performance of these techniques look like?
real users without being detected. These, in effect,
could dupe e-commerce enterprises into misleading This is the most important question for our review, and
investments to the detriment of consumers or investors. we use the previous questions to set the scene for it. In
Bots could also be used by fraudsters to steal customer this context, we focus on machine learning and data
data, such as bank and payment details, which could in mining applications for tackling fraud detection in the
turn facilitate other types of fraud. We also find two e-commerce domain. This implies that we do not look
articles in this category in our literature review. at other methods like statistical inference techniques,
Reference [65] uses an extended boosting approach ontologies, or even bespoke algorithms that could be
that incorporates prior human knowledge, inform of relevant in the domain. One more thing to note is that
expert rules and blacklists to compensate for data we only focus on detection methods. In Table 9, we
shortages. The method is tested against a mobile summarize the distribution of these methods across our
application with over 150 million users and achieves an corpus.
accuracy score of 98 percent and a recall of 94 percent. There are many algorithms applied in these articles,
The researchers surface key behavior patterns of bots and therefore we only consider those that are used in
that include less spatial motion as detected by device more than two articles. In Table 9, we show the
sensors (1/10 of human users), a higher IP clustering evolution and frequency of use of the algorithms from
ratio (60 percent in bots vs. 15 percent in human users), the corpus over the years, and a visual summary of the
a higher jailbroken device rate (92 percent in bots vs. same is shown in Fig. 11. In Table 10, we show the
4 percent in human users), more irregular device types of fraud in our domain and the number of articles
names, and fewer IP address changes in bots. The final covering them. The family of artificial neural networks
article in the category[66] looks at this issue of cloud is the most frequently applied machine learning
bots and how they can be used to perform click fraud, category in e-commerce fraud detection, featured in
register fake accounts, and commit other types of more than a third of the articles. It is used frequently in
fraud. The researchers proposed a traffic-based quasi- articles focusing on credit card, web application, and
real-time method for cloud bot detection using machine phishing frauds. While we present this set of
learning that exploits a new sample partitioning algorithms under a broad category, the results show a
approach as well as innovative multi-layer features that variety of specific algorithms, such as deep recurrent
Table 10 Summary of fraud type and article representation from the corpus
Total Fraud type Article
[16, 45, 50, 51, 53, 54, 67, 68, 70, 74, 77, 78, 80, 82, 83, 85, 87−90, 93−96, 98,
Credit card
101, 104, 105, 109−113, 115, 118, 119, 121, 123, 124, 126, 128−159]
68 Financial/Payments Bank payments [44, 84, 92, 116, 160−162]
Online transactions [46, 47, 69, 79, 163]
17 Web application [43, 56, 57, 72, 81, 86, 97, 107, 114, 164−170]
6 Spam/Phishing [43, 59, 60, 61, 62, 122]
4 Triangulation [48, 63, 64, 91]
2 Bot [66, 171]
434 Big Data Mining and Analytics, June 2024, 7(2): 419−444
neural networks, graph neural networks, multilayer classifier as a whole. For example, a binary classifier
perceptron networks, and LSTM. The results also show that achieves 99 percent training accuracy on
that the use of ANNs gained more traction in imbalanced data with 1 percent minority samples
e-commerce fraud detection around 2019. would be irrelevant for predictions on out-of-sample
The second largest category of algorithms is the data. In this case, the classifier is only accurate at
Random Forests algorithm, which features in about 21 predicting the majority, while its performance on the
articles in our literature corpus. Many articles conduct minority class is poor (often, all the instances of the
performance tests in experimental settings where, for minority class are misclassified as instances of the
each data set, several algorithms are jointly tested. In majority class). This is a costly decision because, in
these cases, the authors report the Random Forest and most practical applications, classifying the minority
the ANNs as the best performers[44, 60, 133]. Decision instances correctly is more important[173]. Therefore, it
trees and logistic regression are some of the other is of paramount importance to improve a classifier’s
notable algorithms. ability to recognize the minority class in these settings.
Our search strategy exposes a couple of data Researchers have developed a variety of techniques
mining strategies for e-commerce fraud detection in to address this problem for all types of data.
addition to common machine learning methods. The Techniques such as SMOTE[174] variants are easy to
algorithms discussed in these strategies do not appear apply and frequently improve classifier performance
to be clustered, so we collectively refer to them as the significantly in both categorical and numeric data sets.
“other” category. There are roughly 16 articles in In our literature corpus, we observe a mix of situations
this category, and they primarily cover three types of in which imbalanced learning techniques are applied to
fraud: web application fraud, credit card fraud, and improve classifier performance; however, there are
phishing. One such article focuses on an application some articles that do not use them. What is more,
in an emerging fraud area, triangulation fraud, as accuracy—a highly misleading metric on imbalanced
seen in Ref. [64]. These researchers apply SLT to build data—is used for performance evaluation in these
a prototype for fake site detection, which they test on articles. While we acknowledge that some of the
about nine hundred sites. machine learning algorithms could inherently be
applying a rebalancing mechanism during training
4.4 RQ4: What are the research gaps, trends, and
(algorithmic vs. data-level strategy for class balancing),
opportunities for future research in this area?
we observe that only about 10 percent of all articles
In this question, we show research gaps to inform using machine learning for fraud detection explicitly
future research directions. We synthesize all the talk about their class imbalance resolution strategies.
articles in the final corpus to understand how the Future work should factor in appropriate imbalanced
articles apply machine learning and data mining learning techniques in their fraud detection designs
techniques for e-commerce fraud detection and to whenever machine learning approaches are applied.
surface gaps in their usage. We cover bespoke gaps in 4.4.2 Training data
the following subsections. One criticism of machine learning and data mining for
4.4.1 Class asymmetry fraud detection is the lack of good practical data to use
The issue of imbalanced classes between fraudulent for training algorithms[23]. Real fraud data often carry
and legitimate transactions is rife in fraud data[141]. It sensitive information about consumers, and as such,
occurs when there is an asymmetric distribution companies are constrained by data protection laws
between classes in the data. In the machine learning from sharing such data. Additionally, it is
domain, most algorithms do not perform well on counterintuitive to openly share data and fraud
imbalanced data, as the minority class contributes less detection strategies, as fraudsters can use that
to the learning objective[172]. information to escape detection systems. These
In training an imbalanced data set with a standard challenges make it hard to advance fraud detection
classification method, the minority class contributes research in general. We observe minimal use of real-
less towards the minimization of the objective world fraud data in our corpus, and in those few cases,
function[173], leading to lower classification accuracy the actual details of features used for training the
for the minority class and poor performance of the detection algorithms are hardly mentioned. Some of the
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 435
articles using real-world data include Ref. [44], which research and applications can put their lack of
uses a real dataset from one of Egypt’s top e-payment interpretability into design considerations. We also
gateways, and Ref. [71], which tests their fraud observe that the random forest is highly featured in our
detection system with real-world data from European data. It achieves high performance and is interpretable.
banks’ day-to-day transaction data. The implication of It is possible to tease out the importance of features’
the lack of real-world data is that the majority of the contributions towards the minimization of the objective
research articles in this domain are experimental and function. As such, researchers and practitioners can
likely will not result in real-world fraud detection glean from features highly associated with fraudulent
systems. Future work could look for sandbox instances. In summary, ANNs and the Random Forest
environments that can allow fraud researchers to work algorithm provide a healthy trade-off between
with real-world fraud data to advance the field. performance and interpretability.
We also observe a general overreliance on
transaction data for training machine learning models.
5 Discussion
While these data still achieve good performance, there The objective of the systematic review was to find the
is likely a missed opportunity in training machine state-of-the-art literature on machine learning and data
learning models on multimodal data such as images mining techniques for fraud detection in the
and text beyond the usual numeric and categorical e-commerce domain. We narrowed down our search to
features mined from transaction histories. The last these methods because we believe they will be more
decade has seen significant advances in Natural effective than heuristics and rule-based approaches at
Language Processing (NLP) and computer vision thwarting different sorts of fraud in this domain.
techniques that can do well in creating multiple Additionally, they are simple to monitor for drift, quick
learning contexts to build detection systems that are to use in production, and extremely flexible in a highly
robust to the high level of dynamism observed in dynamic fraud environment where fraudsters are
various fraud domains. A few articles in our corpus constantly coming up with new and creative ways to
incorporate text-based methods[47, 95, 104, 106] in their beat bespoke fraud detection systems. To find and
detection models, but there are no articles looking at examine the most pertinent papers for this fraud topic,
multi-modal approaches incorporating image data into we use a combination of the PRISMA SLR technique
training. This is a future research opportunity area for and content analysis as part of our methodology. We
this domain. choose to focus on e-commerce fraud detection with
4.4.3 Detection algorithms machine learning and data mining techniques because
The use of ANNs to create fraud detection systems in it has not been covered in previous literature reviews
the e-commerce fraud domain is a clear trend in our and, moreover, because our methodology has not been
data. More than 30 percent of all articles use ANNs as applied before in this context. As a result, compared to
their primary learning technique. ANNs use data and studies that covered more expansive domains, such as
information processing techniques inspired by Refs. [13, 175], we surface more relevant articles. Our
biological neural network behavior, and they are work spends less energy and time analyzing various
powerful when used on big data[154]. This explains their fraud domains and types than the majority of fraud
popularity in credit card fraud detection, where they reviews we have encountered so far in the literature.
can be trained using massive amounts of high-velocity Our attention is concentrated on the most significant
transaction data. This trend is reflected in our data set, fraud classes within the e-commerce fraud area that can
in which nearly 60 percent of all articles using ANNs be effectively addressed by the use of machine learning
are geared towards credit card fraud detection. Despite and data mining techniques. Our findings differ from
being widely used and achieving good discriminatory those of similar research in that they show an increase
performance, these techniques lack interpretability, in the use of artificial neural network approaches. In
making it difficult for researchers and practitioners to one such study[13], Naïve Bayes is found to be the most
comprehend the signals that lead to fraud. As a result, commonly used machine learning algorithm, while
their use necessitates a conscious decision to optimize another finds the random forest and logistic regression
performance as opposed to deciphering the underlying as the most frequently used algorithms for fraud
indicators associated with fraudulent instances. Future detection systems.
436 Big Data Mining and Analytics, June 2024, 7(2): 419−444
Among the list of common frauds identified by the detection systems. This is where perpetrators of
literature for e-commerce platforms, credit card fraud is triangulation fraud step in to create fake e-commerce
the most researched based on machine learning and sites that are identical to existing real ones like
data mining techniques. There are a few factors that we Amazon and eBay, which they then use to commit
think explain this phenomenon. First, credit card fraudulent purchases using stolen payment and
payments are most preferred for online payments and personal consumer information like residence
have become ubiquitous for use on e-commerce addresses. Many forms of fake and deceptive websites
sites[166], generating enormous amounts of high- have appeared in the recent past, including spoof and
velocity transaction data; second, with such large concocted sites. Spoof sites are replicas of real
volumes of data, heuristics and basic rule-based commercial sites intended to deceive the real site’s
methods are challenged; and third, access to artificial customers into providing their information, while
intelligence and machine learning tools has improved concocted sites are deceptive websites attempting to
significantly in recent years due to advances in cloud appear as unique, legitimate commercial entities[64].
computing technology and reduced compute costs. Solutions such as, introducing regulations, providing
High detection rates and lower false-positive rates warranties or guarantees on items sold, providing
achieved by these methods also make them preferable insurance, and making bottom-up efforts to inform
for building these detection systems. consumers of products and sellers’ quality and
Notably, there is one common e-commerce fraud reputation could fix the information asymmetry
type that is not surfaced in detail by any of the articles problem and reduce the impact of these types of fraud
in our corpus. Reseller fraud, a deceptive practice in on consumers and businesses.
which a seller purchases products from a company or Machine learning and data mining techniques have
retailer with the intent of reselling them at inflated proven effective in detecting various types of current e-
prices, is a missed opportunity that can be addressed by commerce fraud by leveraging pattern recognition,
subsequent work. It takes advantage of limited supply, anomaly detection, and predictive modeling. However,
high demand, or exclusive products to manipulate the they are not a panacea, particularly for emerging types
market and profit from the price difference. We believe of fraud. While mature types of fraud, such as account
this could have been a common occurrence across takeover fraud, phishing, social engineering, review
some product domains during the COVID 19 pandemic and rating manipulation, and inventory and price
when supply chain disruptions and limited manipulation, can be effectively detected using
manufacturing were rampant. machine learning models, emerging types like bot
The final point worth highlighting for our discussion fraud and triangulation fraud present challenges.
is the emerging fraud types within this domain. There are several factors contributing to the difficulty
Triangulation and bot fraud are new to the e-commerce in addressing emerging fraud types. First, the quality
domain and have tremendous potential for huge losses and representativeness of training data are often
to consumers and merchants alike because of their lacking, as data logging and quality assurance systems
ability to scale quickly. For example, triangulation may lag behind emerging fraud activity. Second,
fraudsters can have the ability to gain access to the developing effective features specific to these fraud
entire transacting base of a real e-commerce customer types requires time and specialized skills. Third, while
if undetected for a sufficiently long time. On the other transparent models like decision trees can provide
hand, bots can work relentlessly, and their actions can explanations for fraud detection decisions, they may
achieve high multiplier effects; therefore, they can not perform well in emerging fraud types, necessitating
cause huge damages within short periods of time. In the use of more complex models like deep learning
our literature corpus, we only find two articles algorithms. Fourth, emerging fraud types may actively
representing each of these fraud types. Given their try to manipulate machine learning models through
pervasiveness, more research should be generated on adversarial attacks, posing additional challenges.
these subdomains. Lastly, the effectiveness of machine learning models in
Information asymmetry makes it possible for detecting fraud degrades over time, necessitating
fraudsters to create fake sites and stay undetected for a regular updates, retraining, and evaluation.
long time, especially for market players without robust To build a future e-commerce fraud detection system
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 437
using machine learning techniques, these factors must given its significance in the e-commerce domain and
be considered during design, implementation, and potential impact on the economy and households.
maintenance to ensure ongoing effectiveness. Further exploration of various techniques, including
Vigilance, collaboration, and adaptation are essential in machine learning, to combat reseller fraud could be a
this dynamic field. By addressing these factors, fraud fruitful area for future work.
detection systems can achieve higher accuracy, faster Our review also shed light on emerging fraud types
response times, and improved resilience against in e-commerce, namely triangulation and bot fraud,
evolving fraud tactics. which have received limited attention in the realm of
It is important to note that the future state of machine learning and data mining techniques. This
e-commerce fraud detection is a dynamic environment, observation underscores the need for further research to
driven by ongoing research, technological innovation, address these novel fraud types effectively.
and evolving fraud tactics. Regular updates, Furthermore, our analysis revealed a growing
collaboration between data scientists and fraud demand for the application of imbalanced learning
prevention teams, and continuous evaluation and techniques to enhance future fraud detection systems.
refinement of models are crucial to stay at the forefront This indicates an opportunity for the concerted use of
of fraud detection capabilities in the e-commerce such techniques to tackle the challenge posed by
domain. imbalanced datasets in fraud detection.
Finally, while this research aims to provide a The findings of our work have practical implications
comprehensive overview of the current state of for practitioners in the e-commerce industry. They can
knowledge in the domain, there are limitations to be replicate the approaches discussed in our corpus and
acknowledged. Language and accessibility barriers implement them to proactively identify and eliminate
exist, as the review is conducted in English and non- malicious actors from their platforms, thereby reducing
English articles are excluded, potentially omitting losses and safeguarding their brand reputations.
important work. Additionally, access limitations to Additionally, our survey contributes to the existing
subscription-based journals may have resulted in body of knowledge and literature on fraud detection in
incomplete coverage of available publications in the the e-commerce domain, providing valuable insights
domain. for future research endeavors.
Overall, our study serves as a comprehensive survey
6 Conclusion that informs both practitioners and researchers,
In this article, we employed a combined PRISMA and facilitating the advancement of fraud detection
content synthesis approach to identify and analyze techniques in the e-commerce domain.
relevant articles focusing on fraud detection in the e- References
commerce domain using machine learning and data
mining techniques. Our survey encompassed a total of [1] S. Monteith, M. Bauer, M. Alda, J. Geddes, P. C.
101 articles, with 16 of them classified as “other” due Whybrow, and T. Glenn, Increasing cybercrime since the
pandemic: Concerns for psychiatry, Curr. Psychiatry
to being unclustered data mining techniques, while the
Rep., vol. 23, no. 4, p. 18, 2021.
remaining articles fell under the mainstream machine [2] S. Kodate, R. Chiba, S. Kimura, and N. Masuda,
learning cluster. Detecting problematic transactions in a consumer-to-
To structure our analysis, we formulated four consumer e-commerce network, Appl. Netw. Sci., vol. 5,
research questions, with the first two providing context no. 1, p. 90, 2020.
[3] R. Samani and G. Davis, McAfee mobile threat report,
for our main question. Among the machine learning
https://www.mcafee.com/enterprise/en-us/assets/reports/
algorithms utilized, ANNs emerged as the most rp-mobile-threat-report-2019.pdf, 2019.
frequently employed, closely followed by random [4] E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun,
forest. Notably, the majority of articles centered around The application of data mining techniques in financial
the detection of credit card fraud, showcasing its fraud detection: A classification framework and an
academic review of literature, Decis. Support Syst., vol.
prevalence in the field. However, we found a dearth of
50, no. 3, pp. 559–569, 2011.
detailed research addressing reseller fraud, also known [5] Sam Smith and Juniper Research, Online payment fraud:
as product flipping or scalping, within our corpus, Market forecasts, emerging threats & segment
highlighting a potential avenue for future investigation analysis 2022-2027, https://www.juniperresearch.com/
438 Big Data Mining and Analytics, June 2024, 7(2): 419−444
[35] S. Yin and X. Luo, A review of learning-based Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4065–4076,
E-commerce, in Proc. 16th Int. Conf. Intelligent Systems 2018.
and Knowledge Engineering, Chengdu, China, 2021, pp. [50] G. Sasikala, M. Laavanya, B. Sathyasri, C. Supraja, V.
483–490. Mahalakshmi, S. S. S. Mole, J. Mulerikkal, S.
[36] H. Paul and A. Nikolaev, Fake review detection on online Chidambaranathan, C. Arvind, K. Srihari, et al., An
E-commerce platforms: A systematic literature review, innovative sensing machine learning technique to detect
Data Min. Knowl. Discov., vol. 35, no. 5, pp. 1830–1881, credit card frauds in wireless communications, Wirel.
2021. Commun. Mob. Comput., vol. 2022, p. 2439205, 2022.
[37] P. Gamini, S. T. Yerramsetti, G. D. Darapu, V. K. [51] P. Verma and P. Tyagi, Analysis of supervised machine
Pentakoti, and P. R. Vegesena, A review on the learning algorithms in the context of fraud detection, ECS
performance analysis of supervised and unsupervised Trans., vol. 107, no. 1, pp. 7189–7200, 2022.
algorithms in credit card fraud detection, Int. J. Res. Eng. [52] A. Baishya and S. Kakoty, A review on web content
Sci. Manag., vol. 4, no. 8, pp. 23–26, 2021. filtering, its technique and prospects, Int. J. Comput. Sci.
[38] M. Petticrew and H. Roberts, Systematic Reviews in the Trends Technol., vol. 7, no. 3, pp. 37–40, 2019.
Social Sciences: A Practical Guide. Oxford, UK: Wiley- [53] E. Ileberi, Y. Sun, and Z. Wang, Performance evaluation
Blackwell, 2006. of machine learning methods for credit card fraud
[39] E. S. Gualberto, R. T. De Sousa, T. P. De B Vieira, J. P. detection using SMOTE and AdaBoost, IEEE Access,
C. L. Da Costa, and C. G. Duque, From feature vol. 9, pp. 165286–165294, 2021.
engineering and topics models to enhanced prediction [54] N. Prabha and S. Manimekalai, Imbalanced data
rates in phishing detection, IEEE Access, vol. 8, pp. classification in credit card fraudulent activities detection
76368–76385, 2020. using multi-class neural network, in Proc. 2nd Int. Conf.
[40] M. E. Falagas, E. I. Pitsouni, G. A. Malietzis, and G. Artificial Intelligence and Smart Energy, Coimbatore,
Pappas, Comparison of PubMed, Scopus, Web of India, 2022, pp. 131–138.
Science, and Google Scholar: Strengths and weaknesses, [55] P. S. Lokhande and B. B. Meshram, E-commerce
FASEB J., vol. 22, no. 2, pp. 338–342, 2008. applications: Vulnerabilities, attacks and countermeasures,
[41] N. J. van Eck and L. Waltman, Software survey: https://www.researchgate.net/publication/235697382_E-
VOSviewer, a computer program for bibliometric Commerce_Applications_Vulnerabilities_Attacks_and_C
mapping, Scientometrics, vol. 84, no. 2, pp. 523–538, ountermeasUres, 2022.
2010. [56] T. Mauritsius, S. Alatas, F. Binsar, R. Jayadi, and N.
[42] A. Perianes-Rodriguez, L. Waltman, and N. J. Van Eck, Legowo, Promo abuse modeling in e-commerce using
Constructing bibliometric networks: A comparison machine learning approach, in Proc. 8th Int. Conf.
between full and fractional counting, J. Informetr., vol. Orange Technology, Daegu, Republic of Korea, 2020,
10, no. 4, pp. 1178–1195, 2016. pp. 1–6.
[43] T. H. Pranto, K. T. A. M. Hasib, T. Rahman, A. B. [57] K. Kim, Y. Choi, and J. Park, Pricing fraud detection in
Haque, A. K. M. N. Islam, and R. M. Rahman, online shopping malls using a finite mixture model,
Blockchain and machine learning for fraud detection: A Electron. Commer. Res. Appl., vol. 12, no. 3, pp.
privacy-preserving and adaptive incentive based 195–207, 2013.
approach, IEEE Access, vol. 10, pp. 87115–87134, 2022. [58] A. G. Marakhtanov, E. O. Parenchenkov, and N. V.
[44] Y. Y. Festa and I. A. Vorobyev, A hybrid machine Smirnov, Detection of fictitious accounts registration, in
learning framework for e-commerce fraud detection, Proc. Int. Russian Automation Conf., Sochi, Russia,
Model Assist. Stat. Appl., vol. 17, no. 1, pp. 41–49, 2022. 2021, pp. 226–230.
[45] E. Ileberi, Y. Sun, and Z. Wang, A machine learning [59] I. Saha, D. Sarma, R. J. Chakma, M. N. Alam, A.
based credit card fraud detection using the GA algorithm Sultana, and S. Hossain, Phishing attacks detection using
for feature selection, J. Big Data, vol. 9, no. 1, p. 24, deep learning approach, in Proc. 3rd Int. Conf. Smart
2022. Systems and Inventive Technology, Tirunelveli, India,
[46] M. H. Nasr, M. H. Farrag, and M. M. Nasr, A proposed 2020, pp. 1180–1185.
fraud detection model based on e-Payments attributes a [60] A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A.
case study in Egyptian e-Payment gateway, Int. J. Adv. Anjum, and M. Hamdani, Phishing web site detection
Comput. Sci. Appl., vol. 13, no. 5, pp. 179–186, 2022. using diverse machine learning algorithms, Electronic
[47] D. H. Lim and H. Ahn, A study on fraud detection in the Library, vol. 38, no. 1, pp. 65–80, 2020.
C2C used trade market using Doc2vec, J. Korea Soc. [61] F. Hasan, S. K. Mondal, M. R. Kabir, M. A. Al Mamun,
Comput. Inform., vol. 27, no. 3, pp. 173–182, 2022. N. S. Rahman, and M. S. Hossen, E-commerce merchant
[48] M. Gao, Account takeover detection on E-commerce fraud detection using machine learning approach, in
platforms, in Proc. IEEE Int. Conf. Smart Computing, Proc. 7th Int. Conf. Communication and Electronics
Helsinki, Finland, 2022, pp. 196–197. Systems, Coimbatore, India, 2022, pp. 1123–1127.
[49] J. Mathew, C. K. Pang, M. Luo, and W. H. Leong, [62] S. Carta, G. Fenu, D. R. Recupero, and R. Saia, Fraud
Classification of imbalanced data by oversampling in detection for E-commerce transactions by employing a
kernel space of support vector machines, IEEE Trans. prudential multiple consensus model, J. Inform. Secur.
440 Big Data Mining and Analytics, June 2024, 7(2): 419−444
Appl., vol. 46, pp. 13–22, 2019. [76] H. Zhou, G. Sun, S. Fu, W. Jiang, and J. Xue, A scalable
[63] L. Beltzung, A. Lindley, O. Dinica, N. Hermann, and R. approach for fraud detection in online e-commerce
Lindner, Real-time detection of fake-shops through transactions with big data analytics, Computers,
machine learning, in Proc. IEEE Int. Conf. Big Data, Materials and Continua, vol. 60, no. 1, pp. 179–192,
Atlanta, GA, USA, 2020, pp. 2254–2263. 2019.
[64] A. Abbasi, Z. Zhang, D. Zimbra, H. Chen, and J. F. N. Jr, [77] P. Tomar, S. Shrivastava, and U. Thakar, Ensemble
Detecting fake websites: The contribution of statistical learning based credit card fraud detection system, in
learning theory, MIS Quart., vol. 34, no. 3, pp. 435–461, Proc. 5th Conf. Information and Communication
2010. Technology, Kurnool, India, 2021, pp. 1–5.
[65] Q. Sun, T. Tang, H. Chai, J. Wu, and Y. Chen, Boosting [78] K. AbdulSattar and M. Hammad, Fraudulent transaction
fraud detection in mobile payment with prior knowledge, detection in FinTech using machine learning algorithms,
Appl. Sci., vol. 11, no. 10, p. 4347, 2021. in Proc. Int. Conf. Innovation and Intelligence for
[66] Y. Guo, J. Shi, Z. Cao, C. Kang, G. Xiong, and Z. Li, Informatics, Computing and Technologies (3ICT),
Machine learning based cloudbot detection using multi- Sakheer, Bahrain, 2020, pp. 1–6.
layer traffic statistics, in Proc. IEEE 21st Int. Conf. High [79] H. Zhou, G. Sun, S. Fu, W. Jiang, and J. Xue, A scalable
Performance Computing and Communications, IEEE 17th approach for fraud detection in online e-commerce
Int. Conf. Smart City, IEEE 5th Int. Conf. Data Science transactions with big data analytics, Comput. Mater.
and Systems, Zhangjiajie, China, 2019, pp. 2428–2435. Contin., vol. 60, no. 1, pp. 179–192, 2019.
[67] J. C. Mathew, B. Nithya, C. R. Vishwanatha, P. Shetty, [80] A. Roshan, A. Vyas, and U. Singh, Credit card fraud
H. Priya, and G. Kavya, An analysis on fraud detection in detection using choice tree technology, in Proc. 2nd Int.
credit card transactions using machine learning Conf. Electronics, Communication and Aerospace
techniques, in Proc. 2nd Int. Conf. Artificial Intelligence Technology, Coimbatore, India, 2018, pp. 1613–1619.
and Smart Energy, Coimbatore, India, 2022, pp. [81] F. Vanhoenshoven, G. Napoles, R. Falcon, K. Vanhoof,
265–272. and M. Koppen, Detecting malicious URLs using
[68] S. Khan, A. Alourani, B. Mishra, A. Ali, and M. Kamal, machine learning techniques, in Proc. IEEE Symp. Series
Developing a credit card fraud detection model using on Computational Intelligence, Athens, Greece, 2016, pp.
machine learning approaches, Int. J. Adv. Comput. Sci. 1–8.
Appl., vol. 13, no. 3, pp. 411–418, 2022. [82] A. Barahim, A. Alhajri, N. Alasaibia, N. Altamimi, N.
[69] K. Abhirami, A. K. Pani, M. Manohar, and P. Kumar, An Aslam, and I. U. Khan, Enhancing the credit card fraud
approach for detecting frauds in E-commerce transactions detection through ensemble techniques, J. Comput.
using machine learning techniques, in Proc. 2nd Int. Conf. Theor. Nanosci., vol. 16, no. 11, pp. 4461–4468, 2019.
Smart Electronics and Communication, Trichy, India, [83] A. S. Saputra and S. Suharjito, Fraud detection using
2021, pp. 826–831. machine learning in e-commerce, Int. J. Adv. Comput.
[70] A. S, N. Sethumadhavan, and H. N. AG, Credit card Sci. Appl., vol. 10, no. 9, pp. 332–339, 2019.
fraud detection using apache spark analysis, in Proc. 5th [84] W. Mostard, B. Zijlema, and M. Wiering, Combining
Int. Conf. Trends in Electronics and Informatics, visual and contextual information for fraudulent online
Tirunelveli, India, 2021, pp. 998–1002. store classification, in Proceedings IEEE/WIC/ACM
[71] K. N. Mishra and S. C. Pandey, Fraud prediction in smart International Conference on Web Intelligence, doi:
societies using logistic regression and K-fold machine 10.1145/3350546.3352504.
learning techniques, Wirel. Pers. Commun., vol. 119, no. [85] R. Sailusha, V. Gnaneswar, R. Ramesh, and G. R. Rao,
2, pp. 1341–1367, 2021. Credit card fraud detection using machine learning, in
[72] S. V. J. B. Gracia, J. G. Ponsam, S. Preetha, and J. G. K. Proc. 4th Int. Conf. Intelligent Computing and Control
Subhiksha, Payment fraud detection using machine Systems, Madurai, India, 2020, pp. 1264–1270.
learning techniques, in Proc. 4th Int. Conf. Computing [86] W. Mostard, B. Zijlema, and M. Wiering, Combining
and Communications Technologies, Chennai, India, visual and contextual information for fraudulent online
2021, pp. 623–626. store classification, in Proc. IEEE/WIC/ACM Int. Conf.
[73] S. Patil, V. Nemade, and P. K. Soni, Predictive modelling Web Intelligence, Thessaloniki, Greece, 2019, pp. 84–90.
for credit card fraud detection using data analytics, [87] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang,
Procedia Comput. Sci., vol. 132, pp. 385–395, 2018. Random forest for credit card fraud detection, in Proc.
[74] R. F. Lima and A. C. M. Pereira, A fraud detection model IEEE 15th Int. Conf. Networking, Sensing and Control,
based on feature selection and undersampling applied to Zhuhai, China, 2018, pp. 1–6.
web payment systems, in Proc. IEEE/WIC/ACM Int. [88] S. K. Kalhotra, S. V. Dongare, A. Kasthuri, and D. Kaur,
Joint Conf. Web Intelligence and Intelligent Agent Data mining and machine learning techniques for credit
Technology, Singapore, 2015, pp. 219–222. card fraud detection, ECS Trans., vol. 107, no. 1, pp.
[75] G. K. Nune and P. V. Sena, Novel artificial neural 4977–4985, 2022.
networks and logistic approach for detecting credit card [89] T. Vairam, S. Sarathambekai, S. Bhavadharani, A. Kavi
deceit, Int. J. Comput. Sci. Netw. Secur., vol. 15, no. 9, Dharshini, N. Nithya Sri, and T. Sen, Evaluation of Naïve
pp. 21–27, 2015. bayes and voting classifier algorithm for credit card fraud
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 441
detection, in Proc. 8th Int. Conf. Advanced Computing [103] X. Liu, K. Yan, L. Burak Kara, and Z. Nie, CCFD-Net: A
and Communication Systems, Coimbatore, India, 2022, novel deep learning model for credit card fraud detection,
pp. 602–608. in Proceedings IEEE 22nd International Conference on
[90] I. Ali, K. Aurangzeb, M. Awais, R. J. u. H. Khan, and S. Information Reuse and Integration for Data Science, doi:
Aslam, An efficient credit card fraud detection system 10.1109/IRI51335.2021.00008.
using deep-learning based approaches, in Proc. IEEE [104] M. Zamini and G. Montazer, Credit card fraud detection
23rd Int. Multi-Topic Conf., Bahawalpur, Pakistan, 2020, using autoencoder based clustering, in Proc. 9th Int.
pp. 1–6. Symp. Telecommunications, Tehran, Iran, 2018, pp.
[91] K. Shin, T. Ishikawa, Y. L. Liu, and D. L. Shepard, 486–491.
Learning DOM trees of web pages by subpath kernel and [105] X. Liu, K. Yan, L. Burak Kara, and Z. Nie, CCFD-Net: A
detecting fake e-commerce sites, Mach. Learn. Knowl. novel deep learning model for credit card fraud detection,
Extr., vol. 3, no. 1, pp. 95–122, 2021. in Proc. IEEE 22nd Int. Conf. Information Reuse and
[92] Y. Dong, Z. Jiang, A. Mamoun, and P. M. Kumar, Real- Integration for Data Science, Las Vegas, NV, USA,
time fraud detection in e-market using machine learning 2021, pp. 9–16.
algorithms, J. Mult.-Valued Log. Soft Comput., vol. 36, [106] J. A. Smiles and T. Kamalakannan, Data mining based
nos. 1−3, pp. 191–209, 2021. hybrid latent representation induced ensemble model
[93] V. Mareeswari and G. Gunasekaran, Prevention of credit towards fraud prediction, in Proc. 3rd Int. Conf.
card fraud detection based on HSVM, in Proc. Int. Conf. Intelligent Sustainable Systems, Thoothukudi, India,
Information Communication and Embedded Systems, 2020, pp. 376–382.
Chennai, India, 2016, pp. 1–4. [107] M. Zhao, Z. Li, B. An, H. Lu, Y. Yang, and C. Chu,
[94] G. P. Santiago, A. C. M. Pereira, and R. Hirata, A Impression allocation for combating fraud in
modeling approach for credit card fraud detection in E-commerce via deep reinforcement learning with action
electronic payment services, in Proc. 30th Annu. ACM norm penalty, in Proc. 27th Int. Joint Conf. Artificial
Symp. Applied Computing, Salamanca, Spain, 2015, pp. Intelligence, doi:10.24963/ijcai.2018/548.
2328–2331. [108] A. Srivastava, M. Yadav, S. Basu, S. Salunkhe, and M.
[95] A. Mitra and M. Siddhant, Credit card fraud detection Shabad, Credit card fraud detection at merchant side
using autoencoders, YMER, vol. 21, no. 6, pp. 337–342, using neural networks, in Proc. 3rd Int. Conf. Computing
2022. for Sustainable Global Development, New Delhi, India,
[96] G. M. Rao and K. Srinivas, RNN-BD: An approach for 2016, pp. 667–670.
fraud visualisation and detection using deep learning, Int. [109] T. K. Behera and S. Panigrahi, Credit card fraud
J. Comput. Sci. Eng., vol. 25, no. 2, pp. 166–173, 2022. detection: A hybrid approach using fuzzy clustering &
[97] H. Huang, B. Liu, X. Xue, J. Cao, and X. Chen, neural network, in Proc. 2nd Int. Conf. Advances in
Imbalanced credit card fraud detection data: A solution Computing and Communication Engineering,
based on hybrid neural network and clustering-based doi:10.1109/ICACCE.2015.33.
undersampling technique, Applied Soft Computing, vol. [110] B. J. Ford, H. Xu, and I. Valova, A real-time self-
154, p. 111368, 2024. adaptive classifier for identifying suspicious bidders in
[98] J. Forough and S. Momtazi, Ensemble of deep sequential online auctions, Comput. J., vol. 56, no. 5, pp. 646–663,
models for credit card fraud detection, Appl. Soft 2013.
Comput., vol. 99, p. 106883, 2021. [111] C. Liu, Q. W. Zhong, X. Ao, L. Sun, W. L. Lin, J. H.
[99] N. T. N. Anh, T. Q. Khanh, N. Q. Dat, E. Amouroux, and Feng, Q. He, and J. Y. Tang, Fraud transactions detection
V. K. Solanki, Fraud detection via deep neural variational via behavior tree with local intention calibration, in Proc.
autoencoder oblique random forest, in Proceedings of 26th ACM SIGKDD Int. Conf. Knowledge Discovery and
2020 IEEE-HYDCON International Conference on Data Mining, Virtual Event, 2020, pp. 3035–3043.
Engineering in the 4th Industrial Revolution, HYDCON [112] S. Alqethami, B. Almutanni, and M. Alghamdi, Fraud
2020, doi: 10.1109/HYDCON48903.2020.9242753. detection in E-commerce, Int. J. Comput. Sci. Netw.
[100] A. K. Rai and R. K. Dwivedi, Fraud detection in credit Secur., vol. 21, no. 6, pp. 200–206, 2021.
card data using machine learning techniques, in Proc. 2nd [113] A. Maurya and A. Kumar, Credit card fraud detection
Int. Conf. Machine Learning, Image Processing, Network system using machine learning technique, in Proc. IEEE
Security and Data Sciences, Silchar, India, 2020, pp. Int. Conf. Cybernetics and Computational Intelligence,
369–382. Malang, Indonesia, 2022, pp. 500–504.
[101] N. T. N. Anh, T. Q. Khanh, N. Q. Dat, E. Amouroux, and [114] V. H. Khang, C. T. Anh, N. D. Thuan, and H. C. M. City,
V. K. Solanki, Fraud detection via deep neural variational Detecting fraud transaction using ripper algorithm
autoencoder oblique random forest, in Proc. IEEE- combines with ensemble learning model, International
HYDCON, Hyderabad, India, 2020, pp. 1–6. Journal of Advanced Computer Science and
[102] J. Wang and C. Wu, Camouflage is NOT easy: Applications, vol. 14, no. 4, p. 2023, 2023.
Uncovering adversarial fraudsters in large online app [115] Z. Li, M. Huang, G. Liu, and C. Jiang, A hybrid method
review platform, Measurement and Control, vol. 53, nos. with dynamic weighted entropy for handling the problem
9&10, pp. 2137–2145, 2020. of class imbalance with overlap in credit card fraud
442 Big Data Mining and Analytics, June 2024, 7(2): 419−444
detection, Expert Systems with Applications, vol. 175, pp. [129] B. Lebichot, T. Verhelst, Y. A. Le Borgne, L. He-
114750, 2021. Guelton, F. Oble, and G. Bontempi, Transfer learning
[116] V. H. Khang, C. T. Anh, and N. D. Thuan, Detecting strategies for credit card fraud detection, IEEE Access,
fraud transaction using ripper algorithm combines with vol. 9, pp. 114754–114766, 2021.
ensemble learning model, Int. J. Adv. Comput. Sci. Appl., [130] K. Huang, An optimized LightGBM model for fraud
vol. 14, no. 4, pp. 336–345, 2023. detection, J. Phys.: Conf. Ser., vol. 1651, no. 1, p.
[117] L. Zheng, G. Liu, C. Yan, C. Jiang, M. Zhou, and M. Li, 012111, 2020.
Improved TrAdaBoost and its application to transaction [131] P. Mrozek, J. Panneerselvam, and O. Bagdasar, Efficient
fraud detection, IEEE Trans. Comput. Soc. Syst., vol. 7, resampling for fraud detection during anonymised credit
no. 5, pp. 1304–1316, 2020. card transactions with unbalanced datasets, in Proc.
[118] B. B. Jayasingh and G. B. Sri, Online transaction IEEE/ACM 13th Int. Conf. Utility and Cloud Computing,
anomaly detection model for credit card usage using Leicester, UK, 2020, pp. 426–433.
machine learning classifiers, in Proc. Int. Conf. Emerging [132] A. K. Rai and R. K. Dwivedi, Fraud detection in credit
Smart Computing and Informatics, Pune, India, 2023, pp. card data using unsupervised machine learning based
1–5. scheme, in Proc. Int. Conf. Electronics and Sustainable
[119] R. Raja, K. K. Nagwanshi, S. Kumar, and K. R. Laxmi, Communication Systems, Coimbatore, India, 2020, pp.
Data Mining and Machine Learning Applications. 421–426.
Beverly, MA, USA: Scrivener Publishing, 2022. [133] Y. Lucas, P. E. Portier, L. Laporte, L. He-Guelton, O.
[120] Y. J. Lee, Y. R. Yeh, and Y. C. F. Wang, Anomaly Caelen, M. Granitzer, and S. Calabretto, Towards
detection via online oversampling principal component automated feature engineering for credit card fraud
analysis, IEEE Trans. Knowl. Data Eng., vol. 25, no. 7, detection using multi-perspective HMMs, Future Gener.
pp. 1460–1470, 2013. Comput. Syst., vol. 102, pp. 393–402, 2020.
[121] R. Saia, L. Boratto, and S. Carta, Multiple behavioral [134] Y. Fang, Y. Zhang, and C. Huang, Credit card fraud
models: A divide and conquer strategy to fraud detection detection based on machine learning, Comput. Mater.
in financial data streams, in Proc. 7th Int. Joint Conf. Contin., vol. 61, no. 1, pp. 185–195, 2019.
Knowledge Discovery, Knowledge Engineering and [135] R. Abiramy, K. Narayanan, R. Anandan, and C. S. Paul,
Knowledge Management, Lisbon, Portugal, 2015, pp. Fraud detection for online retail using random forest, Int.
496–503. J. Eng. Adv. Technol., vol. 8, no. 3S, pp. 1–6, 2019.
[122] G. A. Montazer and S. ArabYarmohammadi, Detection [136] R. Jhangiani, D. Bein, and A. Verma, Machine learning
of phishing attacks in Iranian e-banking using a fuzzy- pipeline for fraud detection and prevention in E-
rough hybrid system, Appl. Soft Comput., vol. 35, pp. commerce transactions, in Proc. IEEE 10th Annu.
482–492, 2015. Ubiquitous Computing, Electronics and Mobile
[123] D. Trisanto, N. Rismawati, M. F. Mulya, and F. I. Communication Conf., New York, NY, USA, 2019, pp.
Kurniadi, Effectiveness undersampling method and 135–140.
feature reduction in credit card fraud detection, Int. J. [137] U. Fiore, A. De Santis, F. Perla, P. Zanetti, and F.
Intell. Eng. Syst., vol. 13, no. 2, pp. 173–181, 2020. Palmieri, Using generative adversarial networks for
[124] M. Shao, N. Gu, and X. Zhang, Credit card transactions improving classification effectiveness in credit card fraud
data adversarial augmentation in the frequency domain, detection, Inf. Sci., vol. 479, pp. 448–455, 2019.
in Proc. 5th IEEE Int. Conf. Big Data Analytics, Xiamen, [138] R. Saia and S. Carta, A frequency-domain-based pattern
China, 2020, pp. 238–245. mining for credit card fraud detection, in Proc. of 2nd
[125] Z. Li, H. Wang, P. Zhang, P. Hui, J. Huang, J. Liao, J. International Conference on Internet of Things, Big Data
Zhang, and J. Bu, Live-streaming fraud detection: A and Security (IoTBDS-2017), doi:10.13140/RG.2.2.36578.
heterogeneous graph neural network approach, in Proc. [139] A. Shaji, S. Binu, A. M. Nair, and J. George, Fraud
27th ACM SIGKDD Int. Conf. Knowledge Discovery and detection in credit card transaction using ANN and SVM,
Data Mining, Singapore, 2021, pp. 3670–3678. doi: 10.1007/978-3-030-79276-3_14.
[126] S. Subbulakshmi and D. J. Evanjaline, An efficient [140] R. Saia, Unbalanced data classification in fraud detection
analytics in credit card fraud detection using resolution by introducing a multidimensional space analysis, in
classification (Rc) technique, Int. J. Sci. Technol. Res., Proc. 3rd Int. Conf. Internet of Things, Big Data and
vol. 9, no. 2, pp. 3284–3289, 2020. Security, Funchal, Portugal, 2018, pp. 29–40.
[127] H. Chi, Y. Lu, B. Liao, L. Xu, and Y. Liu, An optimized [141] A. Shaji, S. Binu, A. M. Nair, and J. George, Fraud
quantitative argumentation debate model for fraud detection in credit card transaction using ANN and SVM,
detection in E-commerce transactions, IEEE Intell. Syst., in Proc. 4th EAI Int. Conf. Ubiquitous Communications
vol. 36, no. 2, pp. 52–63, 2021. and Network Computing, Virtual Event, 2021, pp.
[128] K. N. Mishra, V. P. Mishra, S. Saket, and S. P. Mishra, 187–197.
Hybrid approach for deception tracing in smart cities [142] M. A. Jawed, D. K. Sasmal, and M. U. Khan, Credit card
using LR and n-fold intelligent machine learning fraud detection, http://localhost:8080/xmlui/handle/
techniques, Int. J. Manag. Pract., vol. 15, no. 4, pp. 123456789/14658, 2022.
460–487, 2022. [143] J. Lee, Y. C. Lee, and J. T. Kim, Fault detection based on
Abed Mutemi et al.: E-Commerce Fraud Detection Based on Machine Learning Techniques... 443
one-class deep learning for manufacturing applications review of credit card fraud detection techniques in
limited to an imbalanced database, J. Manuf. Syst., vol. electronic finance and banking, Iconic Research and
57, pp. 357–366, 2020. Engineering Journals, vol. 3, no. 2, pp. 456–467, 2019.
[144] M. A. Jawed, D. K. Sasmal, and M. U. Khan, Credit card [159] J. Liu, X. Gu, and C. Shang, Quantitative detection of
fraud detection, http://localhost:8080/xmlui/handle/ financial fraud based on deep learning with combination
123456789/14658, 2021. of E-commerce big data, Complexity, vol. 2020, p.
[145] L. Zhinin-Vera, Credit card fraud detection using 6685888, 2020.
artificial intelligence, doi:10.13140/RG.2.2.13642.18885. [160] F. S. Nezhad and H. R. Shahriari, Fuzzy logic and
[146] S. B. E. Raj and A. A. Portia, Analysis on credit card Takagi-Sugeno Neural-Fuzzy to Deutsche bank fraud
fraud detection methods in Proc. Int. Conf. Computer, transactions, in Proc. 7th Int. Conf. e-Commerce in
Communication and Electrical Technology, Tirunelveli, Developing Countries: With Focus on e-Security, Kish
India, 2011, pp. 152–156. Island, Iran, 2013, pp. 1–15.
[147] L. Moumeni, M. Saber, I. Slimani, I. Elfarissi, and Z.
[161] M. K. Khormuji, M. Bazrafkan, M. Sharifian, S. J.
Bougroun, Machine learning for credit card fraud
Mirabedini, and A. Harounabadi, Credit card fraud
detection, in Proc. 6th Int. Conf. Wireless Technologies,
detection with a cascade artificial neural network and
Embedded, and Intelligent Systems, Singapore, 2022, pp.
imperialist competitive algorithm, International Journal
211–221.
[148] A. S. Muttipati, S. Viswanadham, R. Dharavathu, and J. of Computer Applications, vol. 96, no. 25, pp. 1–9, 2014.
[162] L. Zhou, J. Dang, and Z. Zhang, Research on fault
Nema, LightGBM model for credit card fraud discovery,
diagnosis for on-board equipment of train control system
in Proc. 6th Int. Conf. Microelectronics, Electromagnetics
based on imbalanced text classification, J. Appl. Sci.
and Telecommunications, Singapore, 2022, pp. 51–58.
[149] A. Mohari, J. Dowerah, K. Das, F. Koucher, and D. J. Eng., vol. 24, no. 2, pp. 167–175, 2021.
[163] J. Wang, R. Wen, C. Wu, Y. Huang, and J. Xiong,
Bora, Credit card fraud detection techniques: A review,
FDGars: Fraudster detection via graph convolutional
in Soft Computing for Intelligent Systems, N. Marriwala,
networks in online app review system, in Proc. Web
C. C. Tripathi, S. Jain, and S. Mathapathi, eds.
Conference 2019 - Companion of the World Wide Web
Singapore: Springer, 2022, pp. 157–166.
[150] V. N. Dornadula and S. Geetha, Credit card fraud Conference, WWW 2019, doi: 10.1145/3308560.3316
detection using machine learning algorithms, Procedia 586.
Computer Science, doi: 10.1016/j.procs.2020.01.057. [164] J. Wang, R. Wen, C. Wu, Y. Huang, and J. Xiong,
[151] B. Al-Smadi, Credit card security system and fraud FdGars: Fraudster detection via graph convolutional
detection algorithm, PhD dissertation, Louisiana Tech networks in online app review system, in Proc. World
University, Ruston, LA, USA, 2021. Wide Web Conf., San Francisco, CA, USA, 2019, pp.
[152] V. N. Dornadula and S. Geetha, Credit card fraud 310–316.
detection using machine learning algorithms, Procedia [165] R. Kawase, F. Diana, M. Czeladka, M. Schüler, and M.
Comput. Sci., vol. 165, pp. 631–641, 2019. Faust, Internet fraud: The case of account takeover in
[153] Y. Sahin and E. Duman, Detecting credit card fraud by online marketplace, in Proc. 30th ACM Conf. Hypertext
ANN and logistic regression, in Proc. Int. Symp. and Social Media, Hof, Germany, 2019, pp. 181–190.
Innovations in Intelligent Systems and Applications, [166] P. Pant, P. Srivastava, and A. Gupta, Provisional research
Istanbul, Turkey, 2011, pp. 315–319. on ensemble learning techniques for card fraud detection,
[154] S. L. Marie-Sainte, M. B. Alamir, D. Alsaleh, G. Albakri, Int. J. Eng. Adv. Technol., vol. 8, no. 6S, pp. 13–17,
and J. Zouhair, Enhancing credit card fraud detection 2019.
using deep neural network, in Proc. 2020 Computing [167] W. H. Chang and J. S. Chang, A novel two-stage phased
Conf. Intelligent Computing, Switzerland, 2020, pp. modeling framework for early fraud detection in online
301–313. auctions, Expert Syst. Appl., vol. 38, no. 9, pp.
[155] M. Puh and L. Brkic, Detecting credit card fraud using 11244–11260, 2011.
selected machine learning algorithms, in Proc. 42nd Int. [168] J. S. Chang and W. H. Chang, Analysis of fraudulent
Convention on Information and Communication behavior strategies in online auctions for detecting latent
Technology, Electronics and Microelectronics, Opatija, fraudsters, Electron. Commer. Res. Appl., vol. 13, no. 2,
Croatia, 2019, pp. 1250–1255. pp. 79–97, 2014.
[156] S. K. Hashemi, S. L. Mirtaheri, and S. Greco, Fraud [169] S. S. Bhakta, S. Ghosh, and B. Sadhukhan, Credit card
detection in banking data by machine learning fraud detection using machine learning: A comparative
techniques, IEEE Access, vol. 11, pp. 3034–3043, 2023. study of ensemble learning algorithms, in Proc. 9th Int.
[157] B. Chugh and N. Malik, Machine learning classifiers for Conf. Smart Computing and Communications (ICSCC),
detecting credit card fraudulent transactions, in Kochi, India, 2023, pp. 296–301.
Information and Communication Technology for [170] Z. Faraji, A review of machine learning applications for
Competitive Strategies, A. Joshi, M. Mahmud, and R. G. credit card fraud detection with a case study, SEISENSE
Ragel, eds. Singapore: Springer, 2023, pp. 223–231. Journal of Management, vol. 5, no. 1, pp. 49–59, 2022.
[158] U. L. Chilaka, G. A. Chukwudebe, and A. Bashiru, A [171] G. Douzas and F. Bacao, Effective data generation for
444 Big Data Mining and Analytics, June 2024, 7(2): 419−444