-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Some interesting datascience questions raised during email exchanged with the maintainer:
-what is the distribution of how often a TinyIndex page gets updated? Meaning the crawler uploaded a better data point. There might be pages that get updated way more often than others, and are in fact never
stable. Tokens that I'm thinking of are for example "fun" or "news". I don't remember the number of urls that a single page can hold but it would be interesting to know on average after how much time we estimate a page would get fully replaced (ship of theseus style) (even though obviously it's not a uniform law among the docs of a page but would still be interesting to estimate the "stability" of a page)
-
what is the distribution of TinyIndex pages that are full vs not full.
-
what is the relationship between a page's size and a page's "stability". This might indicate that some page are full but instable so we are missing content.