Skip to content

Conversation

@michael1241
Copy link
Contributor

Works in combination with https://github.com/michael1241/lila-blogrecommender

Process:

  • blogrecommender ingests ublog_post collection from mongodb

  • blogrecommender builds neo4j graph database of blog posts and users who liked the posts

  • blogrecommender generates projection / calculation of blog similarity based on how many overlapping users liked posts

  • lila queries blogrecommender via http with a blog ID and gets up to 20 similar blogs by ID returned

  • lila displays recommended blogs at bottom of blog post in carousel

  • blogrecommender watches mongo replica changestream for insertions or edits to ublog_post collection and adds or updates the graph database accordingly (removal of likes is not updated)

  • blogrecommender updates graph projection hourly (for accuracy) and full database refresh weekly (to account for blog deletions/unliking/GDPR) (timeframe can be edited with whatever you think is best, full ingestion takes about 10 minutes, reprojection takes about 1 minute)

Still to do: set up on lichess server.

@michael1241 michael1241 requested a review from ornicar April 8, 2025 12:12
@michael1241
Copy link
Contributor Author

I'm not sure if the changestream will hit the secondary mongo server too hard if it's looking at every single view (since view count is updated)

@schlawg
Copy link
Contributor

schlawg commented Apr 8, 2025

i know that neo4j has a quick turnaround for these requests even on prod data, but we'll need another cache in lila for this.

@ornicar
Copy link
Collaborator

ornicar commented Apr 9, 2025

This adds 2 new technologies to the prod stack: python and neo4j.

It also adds an HTTP server for lila to hit, and a connection to the prod mongodb replicaset. And it will require baby-sitting when it gets out of sync with the db, which will happen over the years for a variety of reasons.

These are serious infra commitments to carefully consider.

Also I see it lists "mongodb (8.0.4)" as a dependency, which we don't have in prod.

@michael1241
Copy link
Contributor Author

This adds 2 new technologies to the prod stack: python and neo4j.

True. I tried doing the same type of query directly in mongo and it was very slow / heavy. A graph database seems the correct technology for this service (unless we wanted to go for some ML approach which is even more complexity). Python is already present in Kaladin and Irwin so I didn't think that in itself was an issue. Neo4j could definitely be a concern, although the way I was thinking about it, Lila could fallback to showing other blogs of the user as it does currently, in case of issues with this new service. Ultimately it is or should be a non critical service.

It also adds an HTTP server for lila to hit, and a connection to the prod mongodb replicaset. And it will require baby-sitting when it gets out of sync with the db, which will happen over the years for a variety of reasons.

These are serious infra commitments to carefully consider.

The weekly refresh will avoid getting out of sync. But yes the additional network and database load is my main concern / question.

Also I see it lists "mongodb (8.0.4)" as a dependency, which we don't have in prod.

I think it should also work with current mongodb I just listed the version I used.

Ultimately if we don't go ahead with integrating this, I understand. I think it's worth trying due to being non critical - provided it doesn't disrupt / overload http or mongo. There are definitely efficiencies to be made there as well if necessary - more filtered changestream view, and batch responses for http.

@ornicar ornicar marked this pull request as ready for review April 9, 2025 15:26
@ornicar
Copy link
Collaborator

ornicar commented Apr 9, 2025

the UI is broken, there should be a card grid there
image

@ornicar
Copy link
Collaborator

ornicar commented Apr 9, 2025

image

ornicar added 4 commits April 9, 2025 19:47
we're not counting vertical pixels on that page anyway.

Also the file was only loaded for logged in users. The UI was broken for
anons
height: 100%;

// overflow: hidden; /* fixes crazy text overflow on Fx */
min-width: 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

@ornicar ornicar merged commit fa11ba8 into master Apr 9, 2025
8 checks passed
@schlawg schlawg deleted the ublog-recommender branch April 22, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants