-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Ublog recommender to increase old blog visibility #17315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
blog-recommender sample code for michael
|
I'm not sure if the changestream will hit the secondary mongo server too hard if it's looking at every single view (since view count is updated) |
|
i know that neo4j has a quick turnaround for these requests even on prod data, but we'll need another cache in lila for this. |
|
This adds 2 new technologies to the prod stack: python and neo4j. It also adds an HTTP server for lila to hit, and a connection to the prod mongodb replicaset. And it will require baby-sitting when it gets out of sync with the db, which will happen over the years for a variety of reasons. These are serious infra commitments to carefully consider. Also I see it lists "mongodb (8.0.4)" as a dependency, which we don't have in prod. |
True. I tried doing the same type of query directly in mongo and it was very slow / heavy. A graph database seems the correct technology for this service (unless we wanted to go for some ML approach which is even more complexity). Python is already present in Kaladin and Irwin so I didn't think that in itself was an issue. Neo4j could definitely be a concern, although the way I was thinking about it, Lila could fallback to showing other blogs of the user as it does currently, in case of issues with this new service. Ultimately it is or should be a non critical service.
The weekly refresh will avoid getting out of sync. But yes the additional network and database load is my main concern / question.
I think it should also work with current mongodb I just listed the version I used. Ultimately if we don't go ahead with integrating this, I understand. I think it's worth trying due to being non critical - provided it doesn't disrupt / overload http or mongo. There are definitely efficiencies to be made there as well if necessary - more filtered changestream view, and batch responses for http. |
```js
db.ublog_post.updateMany({similar:{$exists:false}},{$set:{similar:[]}})
```
we're not counting vertical pixels on that page anyway. Also the file was only loaded for logged in users. The UI was broken for anons
| height: 100%; | ||
|
|
||
| // overflow: hidden; /* fixes crazy text overflow on Fx */ | ||
| min-width: 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Works in combination with https://github.com/michael1241/lila-blogrecommender
Process:
blogrecommender ingests ublog_post collection from mongodb
blogrecommender builds neo4j graph database of blog posts and users who liked the posts
blogrecommender generates projection / calculation of blog similarity based on how many overlapping users liked posts
lila queries blogrecommender via http with a blog ID and gets up to 20 similar blogs by ID returned
lila displays recommended blogs at bottom of blog post in carousel
blogrecommender watches mongo replica changestream for insertions or edits to ublog_post collection and adds or updates the graph database accordingly (removal of likes is not updated)
blogrecommender updates graph projection hourly (for accuracy) and full database refresh weekly (to account for blog deletions/unliking/GDPR) (timeframe can be edited with whatever you think is best, full ingestion takes about 10 minutes, reprojection takes about 1 minute)
Still to do: set up on lichess server.