0% found this document useful (0 votes)
25 views26 pages

Report

This report investigates biases in search results across major search engines and analyzes online shopper behavior using Drift Diffusion Models. Part A identifies biases in search results, their causes, and provides recommendations for improvement, while Part B examines purchasing intentions and decision-making speeds across different product categories. Key findings include that users take an average of 7.44 seconds to complete a purchase, with faster decisions for lower-cost items and slower deliberation for higher-cost products.

Uploaded by

harnoorsingh2509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views26 pages

Report

This report investigates biases in search results across major search engines and analyzes online shopper behavior using Drift Diffusion Models. Part A identifies biases in search results, their causes, and provides recommendations for improvement, while Part B examines purchasing intentions and decision-making speeds across different product categories. Key findings include that users take an average of 7.44 seconds to complete a purchase, with faster decisions for lower-cost items and slower deliberation for higher-cost products.

Uploaded by

harnoorsingh2509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Exploring the Biases in

Search Results & Behavioral


Analysis using Drift Diffusion
Models

Harnoor SIngh (230441)

Team Member Names : Sundar Shyam (221049)​


Rishab (220885) ​

Contribution : Whole Section B

Instructor Name : Pragathi P. Balasubramani

Course Name : CGS616: HUMAN CENTERED COMPUTING






Table of Contents​
Index

1.​Introduction
2.​Part A: Exploring the Biases in Search Results​
2.1 Methodology​
2.2 WordCloud Analysis​
2.3 Frequency Analysis of Biases​
2.4 Possible Causes for Biases​
2.5 Recommendations​
2.6 Limitations and Challenges
3.​Part B: Behavioral Analysis Using Drift Diffusion Models​
3.1 Methodology​
3.2 Results and Analysis​
3.3 Interpretation of Results
4.​Conclusion
5.​References
6.​Appendices
○​ Appendix A: Code
○​ Appendix B: Data Files (CSV)

2
1. Introduction

The purpose of this report is to explore and analyze two distinct topics:

●​ Part A: The biases present in search results across multiple


search engines (e.g., Google, Bing, DuckDuckGo).​

●​ Part B: Behavioral analysis of online shoppers using Drift


Diffusion Models (DDM) based on the Online Shoppers Purchasing
Intention dataset from the UCI repository.

In part A, findings will focus on identifying biases,understanding the


causes behind them and providing recommendations for reducing such
biases. ​

In Part B, we will analyze purchasing intentions to determine which
factors influence consumer behavior and how quickly decisions are
made based on product types.







Part A: Exploring the Biases in Search Results

3
2.1. Methodology

Data was collected using web scraping techniques applied to four major
search engines:

●​ Google
●​ Bing
●​ DuckDuckGo
●​ Yahoo

The scraping was performed using Selenium and BeautifulSoup, with


nine queries related to topic “Is Facebook listening to our
conversations?”. Searches were conducted across three countries
(India, Netherlands, and USA) using incognito browsing to ensure
neutrality. All collected data was consolidated into a single CSV file for
further analysis i.e. ​

1. Frequent words in the search results (presenting WordClouds). ​
2. Frequency and ordering of results, links and domains. ​
3. Diversity of perspectives. ​
4. Sentiment analysis.




2.2. WordCloud Analysis

4
●​ Method: WordClouds were generated to visualize the most

frequent words in the search results.


Google​






Bing​

5












6
Duck Duck Go​






7
Yahoo ​










Frequency of Domains​

8

Analysis: The frequency of websites/domains appearing in the search results
was calculated. Certain engines, like Google, favored established domains (e.g.,
news outlets), while DuckDuckGo included more niche perspectives.

The frequency analysis of websites/domains for search results across the four
search engines is summarized in the Excel file. The full dataset can be accessed
by below provided links:​
https://docs.google.com/spreadsheets/d/1qOT8Pn138kcOp-_0TK0aZavp-vcb1ed
xHA4a4zXCo3I/edit?usp=drive_link​

Diversity of Perspectives: DuckDuckGo consistently showed a broader range
of perspectives compared to Google, which often displayed results from major
corporations or media outlets.

GOOGLE​
BING​
DUCK DUCK GO​
YAHOO​


Sentiment Analysis

●​ Method: Sentiment analysis was applied to the titles and descriptions of


search results. This helped in determining whether results leaned towards
positive, negative, or neutral sentiment.

Polarity:​
Polarity measures the sentiment of the text on a scale from -1 to +1:​

9
+1 (Positive): Very positive sentiment (e.g., "I love this product!")​
0: Neutral sentiment (e.g., "This is a book.")​
-1 (Negative): Very negative sentiment (e.g., "I hate this product!")

Interpretation:​
0 to +0.2: Slightly positive sentiment.​
+0.2 to +0.5: Positive sentiment, but not overly enthusiastic.​
+0.5 to +1: Strongly positive sentiment.​
0 to -0.2: Slightly negative sentiment.​
-0.2 to -0.5: Negative sentiment, but not highly negative.​
-0.5 to -1: Strongly negative sentiment.​

Subjectivity:​
Subjectivity measures how much of the text is based on personal opinions or
feelings, versus factual information. It ranges from 0 to 1:​
0: Very objective (factual, unbiased).​
1: Very subjective (opinionated, emotional).

Interpretation:​
0 to 0.3: Mostly objective content (fact-based, unbiased).​
0.3 to 0.6: Moderately subjective content (contains both facts and opinions).​
0.6 to 1: Mostly subjective content (opinion-based, emotional).​




GOOGLE

10
File Name Subjectivity Polarity
Google.txt 0.4817440769 -0.0135761201



















11
YAHOO​

File Name Polarity Subjectivity


Yahoo.txt -0.1175324675 0.602987013
















12
DUCKDUCKGO​


File Name Polarity Subjectivity


Duck.txt -0.08151755652 0.6006859267














13

BING ​



File Name Polarity Subjectivity

Bing.txt -0.07211317723 0.5392102847











14
Differentiation Between Search Engines

The comparison between search engines revealed:

●​ Frequency of Results: Google provided the highest number of


results, while DuckDuckGo emphasized privacy by limiting
unnecessary data exposure.
●​ Content Focus: Yahoo often prioritized entertainment-related
content, while Google and Bing provided more professional/academic
results.
●​ Diversity of Perspectives: DuckDuckGo consistently showed a
broader range of perspectives compared to Google, which often
displayed results from major corporations or media outlets.


2.3. Google’s Behavior Across India, Netherlands, and
USA
Google’s search results were analyzed specifically for changes in
behavior across the three countries:

●​ India: Results emphasized regional news outlets and local


commercial sites, reflecting strong localization.
●​ Netherlands: Search results tended to include a mix of local
news, EU-centric content, and English-language sources, with
less commercialization compared to the USA.
●​ USA: Results leaned heavily toward commercial content and
advertisements, with major news outlets dominating the
rankings.




15
INDIA​




Domain frequency​

Perspective analysis ​

Sentiment analysis ​

16






USA​

Domain frequency ​

Perspective analysis ​

Sentiment analysis ​

17

WORD CLOUD​


18

Netherlands​

WORD CLOUD​



Domain frequency ​

Perspective analysis​

19
Sentiment analysis​




2.4. Possible Causes for Biases

The observed biases in search results may arise from several factors:

1.​ Algorithm Design: Search engines prioritize specific types of content


(e.g., commercial, academic) based on their algorithms.
2.​ Localization: Regional variations influence the results displayed in
different countries.
3.​ Personalization: Despite incognito mode, algorithms may still factor in
broader geographic trends.
4.​ Economic Influences: Sponsored content tends to dominate in markets
like the USA.

2.5. Recommendations

To mitigate these biases, the following steps are recommended:

20
1.​ Algorithm Transparency: Search engines should disclose their ranking
criteria.
2.​ Inclusion of Diverse Sources: Actively include a mix of perspectives in
search results.
3.​ User Education: Educate users about biases and how search engines
prioritize content.

2.6. Limitations and Challenges

1.​ Technical Limitations: Some engines (e.g., Google) actively block


scraping attempts.
2.​ Subjectivity: Analysis of “bias” involves subjective judgment, which may
affect the results.
3.​ Data Volume: Processing a large volume of data across multiple queries
and countries was resource-intensive.




Part B: Behavioral Analysis Using Drift Diffusion Models

3.1. Methodology

The Online Shoppers Purchasing Intention dataset was used to model


purchasing behavior using Drift Diffusion Models (DDMs). The following steps
were undertaken:

1.​ Data Cleaning:​

○​ The dataset was preprocessed by filtering for completed purchases


(revenue = True) and removing missing or inconsistent entries.

21
○​ Columns irrelevant to the analysis (e.g., session identifiers) were
excluded.
2.​ Modeling:​

○​ Python libraries like pyddm, scipy, and pandas were used to apply
the Drift Diffusion Model.
○​ The drift rate (decision speed), mean response time, and standard
deviation were calculated for each product category.
○​ The analysis focused on identifying behavioral patterns in purchases
based on product types.
3.​ Visualization:​

○​ Histograms of response times were generated for different product


types to observe their distribution.
○​ Scatter plots were used to compare drift rates and response times,
offering a visual correlation between product types and
decision-making speeds.​

22



3.2. Results and Analysis​

Behavioral Analysis Report - Online Shoppers Purchasing Intention

Drift rates results

Response Times for Purchases

●​ Mean Response Time: 7.44 seconds


●​ Standard Deviation: 4.78 seconds

These results indicate that, on average, users take about 7.44 seconds to
complete a purchase. However, there is variability in this behavior, as highlighted
by the standard deviation.

Drift Rates by Product Types

23
●​ Health & Beauty: Products in this category exhibited the highest drift
rates, indicating faster decision-making.
●​ Electronics: Lower drift rates were observed for this category, suggesting
users take more time to research and deliberate before purchasing.
●​ Home & Garden: This category displayed the slowest decision-making
times, pointing to greater consumer deliberation.

Visualizations

1.​ Histograms:​
Histograms for each product category illustrated the response time
distributions. Categories like Health & Beauty had narrow peaks,
reflecting quicker and more consistent decision-making.​

2.​ Scatter Plots:​


Scatter plots showed that categories with higher drift rates (e.g., Health &
Beauty) correspond to lower response times, while categories like Home
& Garden exhibited the opposite trend.​

3.3. Interpretation of Results

1.​ Fastest Decisions:​

○​ Categories like Health & Beauty were associated with the quickest
purchasing decisions.
○​ This may be attributed to high demand, lower costs, or simpler
decision-making processes for these items.
2.​ Slower Decisions:​

○​ Products in Home & Garden and Electronics required more


deliberation, potentially due to their higher cost, complexity, or
longer-term utility.
○​ These patterns suggest that users are more cautious when
purchasing expensive or less familiar products.

24
3.​ Statistical Significance:​

○​ An ANOVA test revealed statistically significant differences in


response times across product categories (p-value = 0.03).
○​ This confirms that the variation in decision-making times is unlikely
due to random chance and is influenced by product type.

4. Conclusion

This report provided insights into online shoppers’ behavior using Drift Diffusion
Models. Key findings include:

●​ Purchasing Behavior:​

○​ On average, users take 7.44 seconds to complete a purchase, with


products like Health & Beauty exhibiting the quickest decisions.
○​ Categories like Electronics and Home & Garden involve slower
decision-making due to higher deliberation needs.
●​ Drift Diffusion Analysis:​

○​ Drift rates are significantly higher for simpler, lower-cost items.


○​ Slower drift rates are observed for high-involvement products, such
as Electronics.
●​ Recommendations:​

○​ Businesses should highlight key features and provide transparent


pricing for high-deliberation items to reduce decision-making times.
○​ For quicker-purchase categories, focus on easy navigation and
convenience to maintain high conversion rates.



25
5. References

●​ UCI Online Shoppers Purchasing Intention Dataset:​

https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intenti
on+dataset
●​ Python Libraries:
○​ PyDDM Documentation
○​ Pandas Documentation
○​ Scipy Documentation
●​ Statistical Analysis Resources:
○​ ANOVA Test Methodology and Applications

6.Appendices

○​ Appendix A: Code
○​ Appendix B: Data Files (CSV)

26

You might also like