Exploring the Biases in
Search Results & Behavioral
Analysis using Drift Diffusion
Models
Harnoor SIngh (230441)
Team Member Names : Sundar Shyam (221049)
Rishab (220885)
Contribution : Whole Section B
Instructor Name : Pragathi P. Balasubramani
Course Name : CGS616: HUMAN CENTERED COMPUTING
Table of Contents
Index
1.Introduction
2.Part A: Exploring the Biases in Search Results
2.1 Methodology
2.2 WordCloud Analysis
2.3 Frequency Analysis of Biases
2.4 Possible Causes for Biases
2.5 Recommendations
2.6 Limitations and Challenges
3.Part B: Behavioral Analysis Using Drift Diffusion Models
3.1 Methodology
3.2 Results and Analysis
3.3 Interpretation of Results
4.Conclusion
5.References
6.Appendices
○ Appendix A: Code
○ Appendix B: Data Files (CSV)
2
1. Introduction
The purpose of this report is to explore and analyze two distinct topics:
● Part A: The biases present in search results across multiple
search engines (e.g., Google, Bing, DuckDuckGo).
● Part B: Behavioral analysis of online shoppers using Drift
Diffusion Models (DDM) based on the Online Shoppers Purchasing
Intention dataset from the UCI repository.
In part A, findings will focus on identifying biases,understanding the
causes behind them and providing recommendations for reducing such
biases.
In Part B, we will analyze purchasing intentions to determine which
factors influence consumer behavior and how quickly decisions are
made based on product types.
Part A: Exploring the Biases in Search Results
3
2.1. Methodology
Data was collected using web scraping techniques applied to four major
search engines:
● Google
● Bing
● DuckDuckGo
● Yahoo
The scraping was performed using Selenium and BeautifulSoup, with
nine queries related to topic “Is Facebook listening to our
conversations?”. Searches were conducted across three countries
(India, Netherlands, and USA) using incognito browsing to ensure
neutrality. All collected data was consolidated into a single CSV file for
further analysis i.e.
1. Frequent words in the search results (presenting WordClouds).
2. Frequency and ordering of results, links and domains.
3. Diversity of perspectives.
4. Sentiment analysis.
2.2. WordCloud Analysis
4
● Method: WordClouds were generated to visualize the most
frequent words in the search results.
Google
Bing
5
6
Duck Duck Go
7
Yahoo
Frequency of Domains
8
Analysis: The frequency of websites/domains appearing in the search results
was calculated. Certain engines, like Google, favored established domains (e.g.,
news outlets), while DuckDuckGo included more niche perspectives.
The frequency analysis of websites/domains for search results across the four
search engines is summarized in the Excel file. The full dataset can be accessed
by below provided links:
https://docs.google.com/spreadsheets/d/1qOT8Pn138kcOp-_0TK0aZavp-vcb1ed
xHA4a4zXCo3I/edit?usp=drive_link
Diversity of Perspectives: DuckDuckGo consistently showed a broader range
of perspectives compared to Google, which often displayed results from major
corporations or media outlets.
GOOGLE
BING
DUCK DUCK GO
YAHOO
Sentiment Analysis
● Method: Sentiment analysis was applied to the titles and descriptions of
search results. This helped in determining whether results leaned towards
positive, negative, or neutral sentiment.
Polarity:
Polarity measures the sentiment of the text on a scale from -1 to +1:
9
+1 (Positive): Very positive sentiment (e.g., "I love this product!")
0: Neutral sentiment (e.g., "This is a book.")
-1 (Negative): Very negative sentiment (e.g., "I hate this product!")
Interpretation:
0 to +0.2: Slightly positive sentiment.
+0.2 to +0.5: Positive sentiment, but not overly enthusiastic.
+0.5 to +1: Strongly positive sentiment.
0 to -0.2: Slightly negative sentiment.
-0.2 to -0.5: Negative sentiment, but not highly negative.
-0.5 to -1: Strongly negative sentiment.
Subjectivity:
Subjectivity measures how much of the text is based on personal opinions or
feelings, versus factual information. It ranges from 0 to 1:
0: Very objective (factual, unbiased).
1: Very subjective (opinionated, emotional).
Interpretation:
0 to 0.3: Mostly objective content (fact-based, unbiased).
0.3 to 0.6: Moderately subjective content (contains both facts and opinions).
0.6 to 1: Mostly subjective content (opinion-based, emotional).
GOOGLE
10
File Name Subjectivity Polarity
Google.txt 0.4817440769 -0.0135761201
11
YAHOO
File Name Polarity Subjectivity
Yahoo.txt -0.1175324675 0.602987013
12
DUCKDUCKGO
File Name Polarity Subjectivity
Duck.txt -0.08151755652 0.6006859267
13
BING
File Name Polarity Subjectivity
Bing.txt -0.07211317723 0.5392102847
14
Differentiation Between Search Engines
The comparison between search engines revealed:
● Frequency of Results: Google provided the highest number of
results, while DuckDuckGo emphasized privacy by limiting
unnecessary data exposure.
● Content Focus: Yahoo often prioritized entertainment-related
content, while Google and Bing provided more professional/academic
results.
● Diversity of Perspectives: DuckDuckGo consistently showed a
broader range of perspectives compared to Google, which often
displayed results from major corporations or media outlets.
2.3. Google’s Behavior Across India, Netherlands, and
USA
Google’s search results were analyzed specifically for changes in
behavior across the three countries:
● India: Results emphasized regional news outlets and local
commercial sites, reflecting strong localization.
● Netherlands: Search results tended to include a mix of local
news, EU-centric content, and English-language sources, with
less commercialization compared to the USA.
● USA: Results leaned heavily toward commercial content and
advertisements, with major news outlets dominating the
rankings.
15
INDIA
Domain frequency
Perspective analysis
Sentiment analysis
16
USA
Domain frequency
Perspective analysis
Sentiment analysis
17
WORD CLOUD
18
Netherlands
WORD CLOUD
Domain frequency
Perspective analysis
19
Sentiment analysis
2.4. Possible Causes for Biases
The observed biases in search results may arise from several factors:
1. Algorithm Design: Search engines prioritize specific types of content
(e.g., commercial, academic) based on their algorithms.
2. Localization: Regional variations influence the results displayed in
different countries.
3. Personalization: Despite incognito mode, algorithms may still factor in
broader geographic trends.
4. Economic Influences: Sponsored content tends to dominate in markets
like the USA.
2.5. Recommendations
To mitigate these biases, the following steps are recommended:
20
1. Algorithm Transparency: Search engines should disclose their ranking
criteria.
2. Inclusion of Diverse Sources: Actively include a mix of perspectives in
search results.
3. User Education: Educate users about biases and how search engines
prioritize content.
2.6. Limitations and Challenges
1. Technical Limitations: Some engines (e.g., Google) actively block
scraping attempts.
2. Subjectivity: Analysis of “bias” involves subjective judgment, which may
affect the results.
3. Data Volume: Processing a large volume of data across multiple queries
and countries was resource-intensive.
Part B: Behavioral Analysis Using Drift Diffusion Models
3.1. Methodology
The Online Shoppers Purchasing Intention dataset was used to model
purchasing behavior using Drift Diffusion Models (DDMs). The following steps
were undertaken:
1. Data Cleaning:
○ The dataset was preprocessed by filtering for completed purchases
(revenue = True) and removing missing or inconsistent entries.
21
○ Columns irrelevant to the analysis (e.g., session identifiers) were
excluded.
2. Modeling:
○ Python libraries like pyddm, scipy, and pandas were used to apply
the Drift Diffusion Model.
○ The drift rate (decision speed), mean response time, and standard
deviation were calculated for each product category.
○ The analysis focused on identifying behavioral patterns in purchases
based on product types.
3. Visualization:
○ Histograms of response times were generated for different product
types to observe their distribution.
○ Scatter plots were used to compare drift rates and response times,
offering a visual correlation between product types and
decision-making speeds.
22
3.2. Results and Analysis
Behavioral Analysis Report - Online Shoppers Purchasing Intention
Drift rates results
Response Times for Purchases
● Mean Response Time: 7.44 seconds
● Standard Deviation: 4.78 seconds
These results indicate that, on average, users take about 7.44 seconds to
complete a purchase. However, there is variability in this behavior, as highlighted
by the standard deviation.
Drift Rates by Product Types
23
● Health & Beauty: Products in this category exhibited the highest drift
rates, indicating faster decision-making.
● Electronics: Lower drift rates were observed for this category, suggesting
users take more time to research and deliberate before purchasing.
● Home & Garden: This category displayed the slowest decision-making
times, pointing to greater consumer deliberation.
Visualizations
1. Histograms:
Histograms for each product category illustrated the response time
distributions. Categories like Health & Beauty had narrow peaks,
reflecting quicker and more consistent decision-making.
2. Scatter Plots:
Scatter plots showed that categories with higher drift rates (e.g., Health &
Beauty) correspond to lower response times, while categories like Home
& Garden exhibited the opposite trend.
3.3. Interpretation of Results
1. Fastest Decisions:
○ Categories like Health & Beauty were associated with the quickest
purchasing decisions.
○ This may be attributed to high demand, lower costs, or simpler
decision-making processes for these items.
2. Slower Decisions:
○ Products in Home & Garden and Electronics required more
deliberation, potentially due to their higher cost, complexity, or
longer-term utility.
○ These patterns suggest that users are more cautious when
purchasing expensive or less familiar products.
24
3. Statistical Significance:
○ An ANOVA test revealed statistically significant differences in
response times across product categories (p-value = 0.03).
○ This confirms that the variation in decision-making times is unlikely
due to random chance and is influenced by product type.
4. Conclusion
This report provided insights into online shoppers’ behavior using Drift Diffusion
Models. Key findings include:
● Purchasing Behavior:
○ On average, users take 7.44 seconds to complete a purchase, with
products like Health & Beauty exhibiting the quickest decisions.
○ Categories like Electronics and Home & Garden involve slower
decision-making due to higher deliberation needs.
● Drift Diffusion Analysis:
○ Drift rates are significantly higher for simpler, lower-cost items.
○ Slower drift rates are observed for high-involvement products, such
as Electronics.
● Recommendations:
○ Businesses should highlight key features and provide transparent
pricing for high-deliberation items to reduce decision-making times.
○ For quicker-purchase categories, focus on easy navigation and
convenience to maintain high conversion rates.
25
5. References
● UCI Online Shoppers Purchasing Intention Dataset:
https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intenti
on+dataset
● Python Libraries:
○ PyDDM Documentation
○ Pandas Documentation
○ Scipy Documentation
● Statistical Analysis Resources:
○ ANOVA Test Methodology and Applications
6.Appendices
○ Appendix A: Code
○ Appendix B: Data Files (CSV)
26