Page MenuHomePhabricator

Determine how frequently IP addresses may change for a given user
Closed, ResolvedPublic

Description

Motivation

With the cookie-based IP Masking implementation, multiple IP addresses may be connected to the same (temporary) account. The username will remain the same as long as the user retains the browser cookie. However we will still be keeping track of all the IP addresses they are using in the backend and exposing information about them in tools such as IP Info.

We want to understand how often users switch around their IP addresses (both the average and worst-case) in:

  1. 30 days
  2. 90 days

We also want to know how this information varies across different wikis.

This information will help us with making design decisions as we evaluate tools like IP Info will need to change.

If possible, as a bonus, we would also like to understand how much IP Information variation exists for the geolocation, connection methods, connection owners and proxy information for these IPs. This information is accessible by using the MaxMind databases.

Done
  • Number of IPs per each user in 1 month
  • Number of IPs per each user in 2 months. (We only have IP info for the latest 2 months, therefore we provided insights on 60 days instead of the requested 90 days)
  • Select some wikis to see how the pattern varies across different wikis.

Event Timeline

Niharika triaged this task as Medium priority.Apr 15 2022, 11:08 PM
Niharika created this task.
  • Number of IPs per each user in 1 month
  • Number of IPs per each user in 2 months

Summary

We selected English Wikipedia and Portuguese Wikipedia to study the distribution of the number of IPs per user and the distribution difference between the 2 months time period and the 1 month time period.

On both English Wikipedia and Portuguese Wikipedia ,for a longer time period we see a bit more average number of IPs per user. It's in line with our expectation that for a longer time users have more chances to switch their IPs. But for 97.5% of users, the number of IPs per user are the same in the 2 months time period and the 1 month time period.

English Wikipedia

Within 1 month

  • 212 bot users have an average of 15 IP addresses per user. The most IP addresses one user has is 2111.
  • 370164 non-bot users have an average of 1.9 IP addresses per user. The most IP addresses one user has is 518.
  • 95% of users have no more than 5 IPs.
  • 97.5% of users have no more than 9 IPs.
  • 99% of users have no more than 18 IPs.

Within 2 months

  • 320 bot users have an average of 15 IP addresses per user. The most IP addresses one user has is 3571.
  • 618970 non-bot users have an average of 2.1 IP addresses per user. The most IP addresses one user has is 731.
  • 95% of users have no more than 5 IPs.
  • 97.5% of users have no more than 10 IPs.
  • 99% of users have no more than 21 IPs
Portuguese Wikipedia

Within 1 month

  • 10 bot users have an average of 5.7 IP addresses per user. The most IP addresses one user has is 34.
  • 23623 non-bot users have an average of 1.9 IP addresses per user. The most IP addresses one user has is 135.
  • 95% of users have no more than 5 IPs.
  • 97.5% of users have no more than 9 IPs.
  • 99% of users have no more than 17 IPs.

Within 2 months

  • 13 bot users have an average of 5.4 IP addresses per user. The most IP addresses one user has is 34.
  • 40350 non-bot users have an average of 2.0 IP addresses per user. The most IP addresses one user has is 261.
  • 95% of users have no more than 5 IPs.
  • 97.5% of users have no more than 10 IPs.
  • 99% of users have no more than 20 IPs
  • Select some wikis to see how the pattern varies across different wikis.
Summary

We selected 30 wikis to compare their pattern using one month data. The wikis are top 20 wikis , 101st - 105th, and 201st-205th wikis, according to the rank in wiki comparison.

We observed:

  • On all selected wikis, 50% users only has 1 IP.
  • The top 20 wikis have a relative consistent pattern. 95% of users have 3-12 IPs.
  • The wikis ranked after 100 do not show a consistent pattern. The smaller wikis have larger variance because they have fewer users. A few abnormal users can skew the distribution.
Top 20 wikis
wiki_db50_percentile95_percentile97.5_percentile99_percentilemaxavgcount
arwiki147142441.72013419445
commonswiki136121821.53200995927
dewiki112243938593.19149345725
enwiki159185181.898604370164
eswiki147161431.75422447352
fawiki171224.31552.29910812671
frwiki1612251892.15493447007
idwiki171227.961362.2923249705
itwiki161122.6626512.16411123935
jawiki1815261412.40486130324
kowiki13511.051341.4905357396
nlwiki1613251082.11086310301
plwiki159.45201131.91150312023
ptwiki159171351.89463723623
ruwiki159191341.85272142002
trwiki158141501.7273515265
ukwiki12494851.38919513235
viwiki151022.751942.0537637626
wikidatawiki1916271932.42236859003
zhwiki1510181211.98480327308
101-105th wikis
wiki_db50_percentile95_percentile97.5_percentile99_percentilemaxavgcount
dewikivoyage158.97524.391022.276243362
hewikisource15.19.5535.76582.0358419
jawiktionary12513.98521.522449735
nnwiki1124161.123798832
zhwiktionary1126231.170115870
201 - 205th wikis
wiki_db50_percentile95_percentile97.5_percentile99_percentilemaxavgcount
eswikiquote1457211.452489221
lmowiki1236.02231.22400
mrwikisource114.817.47518182.75806562
slwikisource12.74.675551.24074154
thwikisource116.426.7531.36343.26865767

Thank you so much Jennifer! This is really helpful. I think the main thing that we were looking for was the 95% of users have no more than 5 IPs. This could serve as a proxy for how many IPs will be behind most temporary user accounts. But I'm also wary of temp account having significantly different behaviors.

I had two follow up questions:

  • How much of this do you think has to do with number of edits? Is it that the number of edits and the number of IPs is almost the same? Or is it that the number edits has no bearing on the number of IPs?
  • Is there any difference between the number of IPs for users who’ve registered recently (say in the last 3 months) and those who’ve had accounts for longer?

We ran usability IP Masking related usability tests in English, Spanish, Arabic and Japanese (ongoing) and its really cool to see that data here (clipping from your table above).

wiki_db50_percentile95_percentile97.5_percentile99_percentilemaxavgcount
arwiki147142441.72013419445
enwiki159185181.898604370164
eswiki147161431.75422447352
jawiki1815261412.40486130324

I don't think I have any insights from this yet, but will keep it mind as we go through our results. Thanks!

jwang updated the task description. (Show Details)