User Details
- User Since
- Sep 2 2021, 7:28 AM (168 w, 2 d)
- Availability
- Available
- LDAP User
- SCherukuwada
- MediaWiki User
- SCherukuwada (WMF) [ Global Accounts ]
Thu, Nov 7
Katherine's manager is on leave, skip-level here approving.
Oct 17 2024
Just confirmed with Google that they've fixed this problem. I checked several examples. If you find broken ones, please add them here and reopen.
Oct 11 2024
Thank you.
- What form does this model take?
- Where is the code?
- How can MW code use this model?
- Where's the documentation that explains how to retrain the model / deploy it?
Jun 24 2024
Skip manager approves in manager's absence.
Manager is currently on leave, skip manager approves.
Jun 18 2024
Manager approves.
Jun 13 2024
@Robertsky We heard back from the property owner who showed us graphs of traffic and traffic levels and said that this was clearly more than what they considered acceptable. Their concern was purely about volume and not about suspicious looking user agent strings.
Jun 10 2024
I've reached out to Partnerships to surface this to Google.
We're learning more as we are talking to owners of the sites who've decided to block Citoid, while also trying to be sure about how much traffic we're sending to them.
Jun 9 2024
Ah you're right. It seems to have fixed it for English Wiktionary, but not for French. Try the query "wiktionary" and see what it says for english results. There's a chance the french ones have just not been scraped yet, but I can't see that on search console yet (I don't have access for wiktionary). Let me get access and see when they were last indexed.
Jun 7 2024
For Wiktionary, adding og:site_name seems to have fixed the problem.
We should probably look at this with a pinch of salt because it also works correctly for commons without it.
Reasoning as requested by @Reedy
Here's some context that I should have included when I filed the ticket:
Jun 6 2024
@Krinkle This could use a second pair of eyes to make sure I'm not missing something completely obvious if you could spare the time.
May 31 2024
Granted. Please check and close the task if everything is in order.
May 16 2024
In the Grafana dash, Saturation -> Total Network says it's about 10 MB/s. Does this count everything the job is doing, including what Zotero might be sending out?
May 15 2024
Idea 1 has the level of information and detail that my teams would find useful, especially in conversations around browser versions, support levels, and such.
May 3 2024
Looks good to me.
Apr 19 2024
I'm comfortable closing this task as resolved given that I've been getting search console requests from people following the process as documented and have been resolving them.
Apr 18 2024
Hello Taavi,
I think ThisDot should be in the table here: https://www.mediawiki.org/wiki/Gerrit/Privilege_policy#Expedited_process_for_trusted_organisations
Apr 8 2024
Just escalated to the folks talking to Google one more time.
Mar 8 2024
Fixed, thank you.
Jan 19 2024
The search console says this page is simply unknown to Google.
Jan 18 2024
I have a patch ready if someone would like to review it.
Jan 8 2024
Here is a summary of our discussions with Google (they proofread this summary):
Dec 18 2023
Update: First Input Delay is going to be deprecated in favour of Interaction to Next Paint.
Nov 28 2023
@Nicholas_Perry is the person talking to them.
Nov 21 2023
Oct 27 2023
Partnerships just told me they will reach out to Google soon. We'll post updates here.
Oct 23 2023
I've requested partnerships to look into this too.
That bit you're seeing is called the "Site Name", as distinct from the page title. The idea behind it seems to be to tell you what site you're looking at (without needing to look at the URL itself, or relying on the page title to include the name of the site).
Oct 3 2023
The important takeaway from this (as per our discussion) was this bit:
Oct 2 2023
We met with Google to discuss this further. Google will provide more details on this soon, but the crux of the matter is that not all pages are guaranteed to be crawled, indexed, and served, as is stated on Search Central documentation.
@Seddon Could you please post an update here and link to relevant tickets?
Sep 29 2023
Folks, here's a plan of action.
Sep 15 2023
Sep 6 2023
Seddon and I are meeting on Friday. We'll have a concrete action plan (or
the beginnings of one) to share on Monday.
Aug 24 2023
We're meeting with them in the next couple of weeks to troubleshoot our scraping problems. Will report back once we learn more.
Aug 16 2023
Manager here: I approve.
Aug 9 2023
This has been reported to Google. We're waiting for them to get back.
Aug 8 2023
@Jdlrobson Did the frontend standards group meeting come to a conclusion, or did it bring up any new insights not already shared in this ticket?
Aug 2 2023
Jul 10 2023
Thanks for all the inputs.
Jul 7 2023
@elukey Do we have another idea on the table aside from asking a team of Android devs (3) to maintain a recommendations service?
This has been granted for a period of one year.
Jun 26 2023
@Soda I need a gmail account to provide access to.
Jun 21 2023
OK, having talked to some folks in the Enterprise org and other teams and having eliminated a few possible problems, the one I'm investigating now is the possibility that for some reason Google's bot is ratelimiting itself. I'll continue to post any findings here.
Jun 5 2023
Thank you for the supporting links.
Jun 2 2023
Here are some unindexed articles (confirmed from search console and from Google). I came upon them by simply hitting "Zufällige Seite" (Random Page) on de Wikisource and checking if the resulting page is indexed at all.
@Xover That's very useful to know.
@Soda Yeah Navboxes would indeed have helped.
@Krinkle Could you please answer the question I had above when you have a chance?
Jun 1 2023
As far as I can tell, a lot of the pages that aren't appearing in the index are simply not linked to from within the Wiki. There are no sitemaps any more, and Special:Lonelypages is uncrawlable because there's a robots.txt rule blocking all Special: pages from being crawled.
May 26 2023
May 24 2023
I think we can close this task for now.
May 23 2023
If the volunteer is given access to Search Console data, they will be able
to examine the following information:
- which queries lead to search results on Google that have Wikis in them
- which results get clicked on by users
- the total query volume that has search results from the Wikis
- Whether a given page from a Wiki property has been indexed or has
problems preventing it from being indexed
- a breakdown of the above information by device platform, country, and
date.
May 18 2023
Manager approval if needed: approved
May 17 2023
I've responded on the ticket with the volunteer. I'll handle it once they get the NDA and C-level approval out of the way.
May 16 2023
Please assign this to me once C-level approval and NDA have been taken care of.
May 15 2023
No strong opinions either way, but it seems to me that the examples that @pmiazga provides aren't all user types. "admin", "group:something", or "en_wiki:admin" are not really user types. They're memberships or privileges, which can grow and change as needed. Isn't there another way to represent those? I would imagine that a user type is something that's largely permanent, while privileges (through memberships) are expected to change.
May 11 2023
Silly question: @Krinkle Given that robots.txt has a Disallow: /wiki/Special: rule, how do search engines read the LonelyPages or RecentChanges pages? As far as I can tell these pages aren't being indexed. Am I missing something?
Apologies for the ridiculous delay. I have Wikisource search console access now and am looking at it.
Apr 26 2023
Manager approves.
Mar 20 2023
WMF Staff who will need this on an ongoing basis. No expiry.
Mar 10 2023
Perhaps @Dbrant knows? I remember Jdlrobson mentioning that they did some work together on this.
Feb 15 2023
Rather than speculate on this one feature in isolation, I recommend letting @ovasileva decide and prioritize based on what else needs doing. She likely to have a more holistic view and more insights from the community.
Feb 3 2023
Just had a conversation with @Catrope . I'm comfortable owning the risk and understand what that entails.
Jan 20 2023
That is not the case. Quoting from T326816 (please read for details):
Jan 17 2023
Please note that any changes to this extension might first need to be
discussed in T326956.
Jan 13 2023
Dec 21 2022
I've mostly focused on search performance and stats for the Wikipedias and haven't had a chance to set up and build an understanding of where we are with Wikisource yet. That's why I don't have an immediate answer for this problem.
Dec 16 2022
The vendor replied with numbers over the last two days. Here are some numbers that might help decide on a strategy. There is a new dump of IPs every day. Between the last two successive dumps (today and yesterday) there is:
- 10-12% daily change
- 1.7M New entries
- 2M Aged off entries (i.e. dropped/removed)
Dec 15 2022
Thanks for replying, @Ladsgroup
Are DBA and SRE folks aware that this entire database will essentially be wiped every one or two days and recreated from a dump? Does that complicate things in any way?
Dec 14 2022
Apologies, I don't have access to wikisource. @mpopov does probably.