Page MenuHomePhabricator

Clean up GrowthExperiments-related user_properties rows
Open, LowPublic

Description

It looks that 17+% of user_properties rows is used by GrowthExperiments code. That's a lot of rows – I think we should invest some effort into removing the rows we don't actually need, consolidate some properties (eg. as part of T284088: Merge preference "Enable the editor help panel" with "Display the newcomer homepage" together) etc.

This is a table with number of GrowthExperiments-related rows at a couple of wikis (mostly Growth team pilot wikis and wikis that have the features for the longest time now):

In [23]: pivot_df.sort_values('Growth %', ascending=False)
Out[23]:
Variable   Growth  Growth_no_mentorship     Total  Growth %
wiki
arwiki    2436441               2237462  14499219     16.80
viwiki     778427                715775   5022918     15.50
kowiki     431262                391826   3449192     12.50
cswiki     297077                270141   2570822     11.56
frwiki    2043960               1899497  22215626      9.20
bnwiki     162257                156591   2107438      7.70

In [24]:

This is the list of properties occupying the most rows:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select up_property, count(*) from user_properties where up_property like 'growthexperiments-%' or up_property='welcomesurvey-responses' group by up_property having count(*) > 1500 order by count(*) desc;
+--------------------------------------------------------+----------+
| up_property                                            | count(*) |
+--------------------------------------------------------+----------+
| growthexperiments-help-panel-tog-help-panel            |    40789 |
| growthexperiments-homepage-enable                      |    35194 |
| growthexperiments-homepage-pt-link                     |    35035 |
| growthexperiments-tour-help-panel                      |    32918 |
| growthexperiments-tour-homepage-mentorship             |    32007 |
| growthexperiments-mentor-id                            |    26936 |
| growthexperiments-tour-homepage-discovery              |    20520 |
| growthexperiments-homepage-variant                     |    19715 |
| growthexperiments-tour-homepage-welcome                |    17258 |
| growthexperiments-homepage-suggestededits-activated    |    12661 |
| growthexperiments-homepage-suggestededits-preactivated |     7211 |
| growthexperiments-homepage-se-filters                  |     6098 |
| growthexperiments-homepage-se-ores-topic-filters       |     4030 |
| welcomesurvey-responses                                |     2793 |
| growthexperiments-homepage-tutorial-completed          |     2646 |
+--------------------------------------------------------+----------+
15 rows in set (0.107 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]>

growthexperiments-mentor-id will be dealt with separately, as part of T304461: Delete `growthexperiments-mentor-id` properties from user_properties, as the rows are no longer necessary.

Related Objects

Event Timeline

Restricted Application added subscribers: revi, Aklapper. · View Herald Transcript

T54777#7724456 would let us avoid storing preferences and A/B flags on account creation. For tours we could probably reduce storage size (especially once temp users are introduced) by making "not seen" the default value. Changing the default value will be cumbersome though.

Two ideas:

  1. Reset user properties to defaults after some period of time (180 days? 90 days?) for inactive users. We delete welcome survey responses after 90 days; it would be like that, but for other properties.
  2. Use cookies on account creation for tours instead of database backed user properties

Two ideas:

  1. Reset user properties to defaults after some period of time (180 days? 90 days?) for inactive users. We delete welcome survey responses after 90 days; it would be like that, but for other properties.

I'm not sure that's a good idea. I'm afraid this kinda negates the effect the positive reinforcement project will have on the users, as it will make contributing harder if an user returns back. It's also quite counter-intuitive (and unless the account vanishes in full, unexpected by users).

  1. Use cookies on account creation for tours instead of database backed user properties

This sounds like a good idea to me. We might be able to even use a long-expiry WAN cache – I guess it's quite unlikely users will see the tour for their first time in more than a month, so that might be enough.

I think this will roughly fall into three buckets:

  • A/B test placements and feature flags: if the core functionality mentioned in T54777#7724456 becomes available, these can probably be rewritten to be functionally equivalent but not store any preference data (other than users manually opting out of their A/B test placement).
  • tour "seen" flags, flags for the blue dot thingie: for more obscure tours, it might be fine to stick with the DB. For tours shown to a large fraction of new users (which is the case for most of our tours) we should be cautious about DB use. We could use cookies (disadvantages: users will see the tour again on a different device, inflates response size), local storage (disadvantages: users will see the tour again on a different device, tours must be fully client-side), use the DB with time-limits (ie. don't show the tour if the user is X days old, whether or not they have seen them), use the DB in some more compact way (bitflags?), maybe use global user preferences (less storage space needed, it's on x1 where space is cheaper, probably better UX). Or maybe once A/B / feature-flag preferences are fixed this is not such a big deal anymore.
  • genuine data storage, e.g. welcome survey or mentor information. Probably not a big deal in itself if the other issues are solved.
  • tour "seen" flags, flags for the blue dot thingie: for more obscure tours, it might be fine to stick with the DB. For tours shown to a large fraction of new users (which is the case for most of our tours) we should be cautious about DB use. We could use cookies (disadvantages: users will see the tour again on a different device, inflates response size), local storage (disadvantages: users will see the tour again on a different device, tours must be fully client-side), use the DB with time-limits (ie. don't show the tour if the user is X days old, whether or not they have seen them), use the DB in some more compact way (bitflags?), maybe use global user preferences (less storage space needed, it's on x1 where space is cheaper, probably better UX). Or maybe once A/B / feature-flag preferences are fixed this is not such a big deal anymore.

What about using the WANObjectCache for this? When a user registers, we'd create entries in the cache for each tour/navigation guide. As the user completes the tours, interacts with the blue dot, etc, we'd delete the cache item.

DMburugu lowered the priority of this task from Medium to Low.Jan 9 2023, 4:53 PM