CASE STUDY
Leveraging Transactional Data
for Micro and Small Enterprise
(MSE) Lending
March 2024 • Dean Caire, Maria Fernandez Vidal
This note shares two case studies providing evidence of the value of transactional data
for credit scoring of different types of micro and small enterprises. The financial service
providers are Indian fintechs Fundfina, which offers credit to small shops, and KarmaLife,
which provides credit for platform workers. The evidence resulting from the credit scoring
models developed and evaluated for this research supports the following messages:
(1) Transactional data can have similar predictive power in credit scoring to credit history.
(2) Combining transactional data with credit history can result in better predictions than
either of these data sets by themselves. (3) These results hold under different circumstances,
including for both different types of MSEs and different types of credit histories.
Introduction
The use of transactional data for credit Transactional data is the record generated by
underwriting can play a part in closing the a person or firm’s operations, and in this case
estimated US$4.9 trillion global financing study, we will focus on MSE transactions
gap for MSEs1. It can help expand access both for micro and small businesses and
to MSEs without formal lending history, micro entrepreneurs and platform workers.
and it can improve product fit by more Some common types of transactional
accurately estimating repayment capacity. data include financial transactions about
This paper contributes to building the sales, expenses, orders and invoices.
evidence base for this by measuring the Transactional data can also include a
predictive power of transactional data in variety of other information such as activity
credit scoring models. records, inventory records, travel records or
customer ratings, depending on the activity
1 Deck: the promise of fintech for micro and small enterprises
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 1
Fundfina
of the MSE. Much like bill and loan repayment activity
recorded by credit bureaus, transactional data trails
provide an objective record of a potential borrower’s Fundfina is a fintech in India that partners with
financial behavior that enables reliable estimates of the fast-moving consumer goods (FMCG) suppliers
ability to repay a loan. and agent networks to access transactional data to
provide loans for MSEs. For the analysis, we used a
representative sample of over 5,000 loans issued by
This paper examines the data and experience of two
Fundfina to MSEs under different partnerships.
fintechs in India that use different sets of transactional
data for credit scoring. Fundfina uses transactional
Fundfina estimates that about 80% of their customers
data from enterprise partners to offer credit to small
do not have a formal credit history. In CGAP’s
businesses, primarily fast-moving consumer goods
customer research, conducted with 852 of their
(FMCG) shops and financial services agents. KarmaLife
customers, 62% of respondents said they did not
uses transactional data from partner platforms to offer
previously have access to a loan or credit as offered
loans to drivers and food delivery platform workers.
by Fundfina. Given that the majority of customers
While it is not novel to recognize that transactional
were new to formal credit, Fundfina did not collect
data can be of value in credit assessments, the
credit bureau data for these clients during its
details of how credit providers use it are generally
underwriting process. Our analysis therefore uses
kept confidential. Direct comparisons between how
customer loan repayment history with Fundfina as a
transactional and credit history data contribute to the
measure of “credit history”. Sixty percent of the loans
predictive power of credit scoring models for a given
in our sample were issued to customers with Fundfina
borrower population are rarely shared.
credit history, or “repeat customers”.
The data analysis and results presented in this paper
Table 1 shows the comparative predictive power of
are not based on the fintech’s proprietary machine
three logistic regression credit scoring models built
learning models2. Instead, we use their data sets
from transactional and/or credit history characteristics.
and consistently apply the scorecard development
Credit scoring models are best judged by accuracy of
methodology most widely used in the credit scoring
out-of-sample prediction3, so for ease of presentation,
industry (logistic regression models) to develop
we compare the models we build using the Area Under
“traditional” credit scorecards. With this methodology,
the Curve (AUC), a popular measure of a model’s overall
the relationships between each model characteristic
predictive power4. AUC ranges from 0.50 (random) to
and loan repayment are easy to understand, compare
1 (perfect prediction), where a higher number indicates
and present with “scorecard points”. Such model
better prediction. Although the AUC statistic is not
transparency facilitates the scorecard model and
directly comparable for models built for different
credit risk management and helps to compare the
borrower populations, credit scorecards with an AUC
contribution of different data sets to a scorecard’s
of around 0.70 or higher are likely to be considered
predictive power.
2 Financial service providers and credit scoring vendors generally protect the details of the credit scoring models they develop as their
intellectual property. Consultants working with such proprietary models are bound by confidentiality agreements.
3 In other words, a characteristic adds value to a credit scoring model if it improves out-of-sample prediction, meaning if it is able to accurately
help predict the outcome for cases that were not used to develop the model.
4 The area under the curve (AUC) statistic and its numeric equivalent “gini coefficient” (where gini = AUC*2 – 1) is a common metric for
evaluating risk-ranking power of single variables and multi-factor scorecards. It is a general measure of predictive power across the whole
distribution of scores and is useful for comparing the predictive power of models built from a given data set. It is not directly comparable
across data sets and is only one of several measures that should be used to judge the quality of credit scorecards (such as model
interpretability, stability and credit-risk assessment comprehensiveness).
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 2
useful in credit decisioning. A higher AUC will usually Fundfina also collects other customer data that we
mean that, for a given strategy, a lender can approve grouped into categories:
more total borrowers for a particular risk appetite or • Demographic: borrower personal characteristics
acceptable delinquency rate. such as marital status, degree type, job type and
home ownership.
• Enterprise: specific enterprise characteristics such
TABLE 1. Comparative prediction by data types
used in models as the main line of business and income stability.
• Enterprise partner: characteristics of the enterprise
AUC (sample AUC
Model characteristics of only repeat (full sample) partner with which the borrower works, such as
customers) quality of customer service.
Fundfina loan
0.74 0.69
repayment history To compare the relative predictive power of these
Transactional features additional types of data, we used the full data set of
(past transaction count, 0.74 0.76
new and repeat borrowers. Table 2 shows analogous
volume and stability)
results for models built with demographic, enterprise
Transactional features
and enterprise partner characteristics.
and Fundfina loan 0.84 0.82
repayment history
TABLE 2. Comparative prediction by data types used in
the models
As we can see in the table, a model built using
transactional data alone has predictive power Model characteristics AUC
comparable to a model built solely on credit history Enterprise characteristics 0.52
data, which is generally considered the best single
Demographics 0.56
type of data to predict future loan repayment5. The
Enterprise partner characteristics 0.64
combination of both data sources offers a better
Transactional data 0.76
prediction than either data on its own. Table 1 presents
Transactional, demographics,
model results for two different samples: the full 0.77
enterprise and enterprise partner
sample, where 60% of clients have credit history; and a
smaller sample that only includes the clients with credit
history. From the repeat customer sample, we can see As shown in the table above, transactional data
that, when people have both data sources available, the outperforms all of these other tested data types.
predictive power is the same for either credit history or Combining any or all of them with the transactional
transactional data on its own, but that a model based data only minimally improves the predictive power
on the combination of both types of data is stronger. when compared to transactional data alone.
For the full sample, the transactional data preforms
better than the credit history data, since part of the Finally, the predictive power and relationships of
borrowers in the sample had no credit history data. This transactional characteristics to loan repayment were
shows that transactional data can be an effective tool stable across Fundfina’s four main enterprise partners
for financial inclusion, allowing providers to underwrite (Table 3). That transactional patterns are predictive of
those that are new to credit. loan repayment irrespective of the type of business
5 An exemplary quote is: “It makes sense that best way to tell if somebody will repay a loan in future is to see if they have repaid one in the
past”, attributed to Miriam Bruhn of the World Bank in the article “Tests of character.”
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 3
BOX 1. Customer impact
In order to learn more about what transactional-based • Eighty-nine percent of customers also reported that
lending can mean for customers, CGAP conducted their business had grown as a result of the loan.
research with a sample of 852 Fundfina customers.
• Access to the loan also improved their ability to
The research showed some promising results for the
manage business finances (80%) and pay bills on
impact on MSEs.
time (77%).
The majority of customers were new to credit. Sixty-
two percent of respondents did not have access to Having access to credit decreased the amount of
a similar form of credit before receiving loans from time people reported spending worrying about their
Fundfina. Most customers felt they did not have easy finances. Seventy-four percent of customers expressed
access to other good options for borrowing. When a decrease in financial stress (15% significantly; 59%
asked if they could easily find a good alternative to slightly), while only 4% reported an increase, with the
Fundfina’s loans, 73% of customers responded “No”, other 22% seeing no change. Finally, while 12% of
9% “Maybe” and only 18% “Yes.” customers were somewhat burdened by the repayment
Customers reported that loans had a positive impact and 3% felt it was a heavy burden, the majority of
on their businesses: customers (85%) did not feel repayments were a
problem. This may be in part due to the fintech’s policy
• Ninety percent of customers described an increase of repayment through small daily amount versus a
in money earned because of the loan, with 21% large payment at the end of the loan that could be
citing significant improvements. more challenging to businesses new to credit and with
variable revenue streams.
that generates the transaction history suggests that
the value of transactional data will be similar across KarmaLife
different types of businesses. KarmaLife is an India-based fintech start-up that serves
gig platform workers and broader pools of blue-collar
workers with different liquidity and savings solutions.
TABLE 3. Predictive power of transactional scorecard Mobility segment workers most commonly cite the
across enterprise partners
need for higher-ticket, instalment-linked loans, often to
Partner Sample sizea AUC finance a vehicle or repairs, which enable them to earn
1 1,925 0.77 more from ride-hailing work.
2 1,341 0.75
KarmaLife has developed its scoring models based
3 1,108 0.77
on partner platform data. Those models have proven
4 667 0.80
predictive for their target segment. KarmaLife’s
a Because we are using AUC, we also share the sample size, as experience shows that early-wage access can lead
absolute AUC numbers are sensitive to the size of a data set and to greater driver engagement, productivity and
particularly to the share of bad loans—where the AUC number
is often higher when there are fewer bad loans. This higher AUC retention, creating incentives for platforms to extend
number does not mean the model is better (or worse) than credit, directly or indirectly, to their drivers. Initial
models built with other data sets.
results suggest longer-tenure loans improve driver
engagement in the immediate week after getting the
loan. Based on a cohort of 8,000 platform workers, 93%
of workers who took a loan were available to work
the next week, in comparison to 85% of workers who
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 4
did not take out a loan despite being eligible. The data characteristics in the Porter data set and their
analogous figures six weeks after loan eligibility were predictive power as measured by the AUC statistic.
95% for borrowers and 89% for non-borrowers. This
suggests that tailored financial services may offer gig
worker platforms a way to serve, engage and retain TABLE 5. How platform activity is related
to loan repayment
their best workers.
Characteristic Relationship to repayment AUC
More transactional activity, particularly more frequent Greater earning is
Sum of digital
and stable activity, is associated with better loan associated with lower 0.60
earnings
repayment. KarmaLife’s experience shows that greater repayment risk
earnings, more working hours, and higher driver ratings Accounts suspensions
Account suspension
are all associated with lower repayment risk. In our associated with higher 0.58
count
repayment risk
analysis based on 15,000 loans to drivers for Porter,
Higher ratings are
a truck and bike delivery platform, we found that
Driver rating associated with lower 0.58
platform data worked as well as credit bureau data in repayment risk
predicting a driver’s creditworthiness. Platform data
More login hours are
also materially improved the predictive accuracy of Login hours associated with lower 0.57
credit models using bureau data. repayment risk
Negative balances
Negative balance
associated with higher 0.56
count
TABLE 4. Comparative prediction by data types repayment risk
Model characteristics AUC
Credit bureau data 0.67
Together the Fundfina and KarmaLife cases support
Transactional data 0.66 the hypotheses that: (i) transactional data can be used
Transactional and bureau data 0.71 as a reliable predictor of loan repayment for clients with
no formal credit history; and (ii) that such data are likely
to improve the prediction of bureau scores or models
Unlike the case of Fundfina, the vast majority (>95%) of based only on credit history data.
platform workers in the KarmaLife dataset had a file in
the credit bureau6.
Conclusion
The relationship between KarmaLife’s platform The two case studies present evidence that
data and loan repayment make ‘business sense’. In transactional data have the potential to predict credit
general, more activity on the platform is associated risk as well as the “gold standard” of credit history.
with lower repayment risk. Different measures of Transactional data are also likely to enhance the
platform activity are also correlated, such that models predictive power of credit scorecards when combined
can reach maximum prediction for a given data set with other types of data. By leveraging transactional
without including all possible characteristics or data, financial service providers may be able to expand
engineered “features” in the scorecard. Table 5 provides access to credit for new borrowers, including those
examples of some of the most predictive platform new to formal credit, without taking on significant
6 While KarmaLife does lend to those without a credit history, the sample we received was for a loan where having credit history in the bureau
was a pre-condition for receiving a loan, and presumably the 5% of clients without such a file were approved on an exceptional basis.
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 5
additional risk. The results in this brief hold for partnerships with companies that share data on their
different types of retail businesses and gig workers, MSE network, similar data can now be accessed
indicating the approach can be applicable across through open finance schemes in some markets.
different sectors. Customer research suggests that Providers can proactively seek sources of this data to
loans received based on transactional data reached grow their portfolios towards underserved segments.
customers who were previously excluded from access Providers can seek partnerships with non-financial
to formal credit and confirms the positive impact that organizations to provide embedded finance products
access to credit can have on small businesses. using transactional data. Where available, providers can
join open finance regimes and develop credit products
This note highlights opportunities to use transactional geared towards those with no or limited credit history,
data to further financial inclusion. Traditionally, as well as improve the accuracy of their assessments
accessing transactional data has required financial for all customers. As these data-sharing schemes that
service providers to negotiate partnerships with empower customers to share their own data grow,
businesses with access to such data for large groups the opportunities to use transactional data to better
of customers. While the models presented in this predict risk and expand access to more customers will
case study rely on fintechs which have developed grow exponentially.
Acknowledgments Rights and Permissions
The authors would like to thank the following This work is available under the Creative Commons
colleagues: Will Cook and Tatiana Alonso Gispert Attribution 4.0 International Public License (https://
for peer review; Xavier Faz and Arisha Salman for creativecommons.org/licenses/by/4.0/).
input and guidance; and Feven Getachew Asfaw
Attribution—Cite the work as follows: Caire, Dean
for editorial support. They would also like to thank
and Maria Fernandez Vidal. 2024. “Leveraging
Badal Malick, Sachin Tripathi, and Siddharth Singh
Transactional Data for Micro and Small Enterprise
of KarmaLife and Nishant Bhaskar and Abhijit
(MSE) Lending.” Case Study. Washington, D.C.:
Naik of Fundfina for contributing samples of their
CGAP. https://www.cgap.org/research/leveraging-
anonymized data, as well as their valuable time and
transactional-data-for-micro-and-small-enterprise-
insights to the study.
lending
All queries on rights and licenses should be
addressed to CGAP Publications, 1818 H Street,
NW, MSN F3K-306, Washington, DC 20433 USA;
e-mail: cgap@worldbank.org.
L everaging Transactional Data for Micro and Small Enterprise (MSE) Lending 6
CGAP members as of February 2024
Transforming Lives with Financial Inclusion
cgap.org