🔥 Recommendations for Ruby and Rails using collaborative filtering
- Supports user-based and item-based recommendations
- Works with explicit and implicit feedback
- Uses high-performance matrix factorization
Add this line to your application’s Gemfile:
gem "disco"Create a recommender
recommender = Disco::Recommender.newIf users rate items directly, this is known as explicit feedback. Fit the recommender with:
recommender.fit([
{user_id: 1, item_id: 1, rating: 5},
{user_id: 2, item_id: 1, rating: 3}
])IDs can be integers, strings, or any other data type
If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating.
recommender.fit([
{user_id: 1, item_id: 1},
{user_id: 2, item_id: 1}
])Each
user_id/item_idcombination should only appear once
Get user-based recommendations - “users like you also liked”
recommender.user_recs(user_id)Get item-based recommendations - “users who liked this item also liked”
recommender.item_recs(item_id)Use the count option to specify the number of recommendations (default is 5)
recommender.user_recs(user_id, count: 3)Get predicted ratings for specific users and items
recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])Get similar users
recommender.similar_users(user_id)Load the data
data = Disco.load_movielensCreate a recommender and get similar movies
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)
recommender.item_recs("Star Wars (1977)")Ahoy is a great source for implicit feedback
views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count
data =
views.map do |(user_id, post_id), _|
{
user_id: user_id,
item_id: post_id
}
endCreate a recommender and get recommended posts for a user
recommender = Disco::Recommender.new
recommender.fit(data)
recommender.user_recs(current_user.id)Disco makes it easy to store recommendations in Rails.
rails generate disco:recommendation
rails db:migrateFor user-based recommendations, use:
class User < ApplicationRecord
has_recommended :products
endChange
:productsto match the model you’re recommending
Save recommendations
User.find_each do |user|
recs = recommender.user_recs(user.id)
user.update_recommended_products(recs)
endGet recommendations
user.recommended_productsFor item-based recommendations, use:
class Product < ApplicationRecord
has_recommended :products
endSpecify multiple types of recommendations for a model with:
class User < ApplicationRecord
has_recommended :products
has_recommended :products_v2, class_name: "Product"
endAnd use the appropriate methods:
user.update_recommended_products_v2(recs)
user.recommended_products_v2If you’d prefer to perform recommendations on-the-fly, store the recommender
json = recommender.to_json
File.write("recommender.json", json)The serialized recommender includes user activity from the training data (to avoid recommending previously rated items), so be sure to protect it. You can save it to a file, database, or any other storage system, or use a tool like Trove. Also, user and item IDs should be integers or strings for this.
Load a recommender
json = File.read("recommender.json")
recommender = Disco::Recommender.load_json(json)Alternatively, you can store only the factors and use a library like Neighbor. See the examples.
Disco uses high-performance matrix factorization.
- For explicit feedback, it uses stochastic gradient descent
- For implicit feedback, it uses coordinate descent
Specify the number of factors and epochs
Disco::Recommender.new(factors: 8, epochs: 20)If recommendations look off, trying changing factors. The default is 8, but 3 could be good for some applications and 300 good for others.
Pass a validation set with:
recommender.fit(data, validation_set: validation_set)Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.
recommender.user_recs(new_user_id) # returns empty arrayThere are a number of ways to deal with this, but here are some common ones:
- For user-based recommendations, show new users the most popular items
- For item-based recommendations, make content-based recommendations with a gem like tf-idf-similarity
Get top items with:
recommender = Disco::Recommender.new(top_items: true)
recommender.fit(data)
recommender.top_itemsThis uses Wilson score for explicit feedback and item frequency for implicit feedback.
Data can be an array of hashes
[{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]Or a Rover data frame
Rover.read_csv("ratings.csv")Or a Daru data frame
Daru::DataFrame.from_csv("ratings.csv")If you have a large number of users or items, you can use an approximate nearest neighbors library like Faiss to improve the performance of certain methods.
Add this line to your application’s Gemfile:
gem "faiss"Speed up the user_recs method with:
recommender.optimize_user_recsSpeed up the item_recs method with:
recommender.optimize_item_recsSpeed up the similar_users method with:
recommender.optimize_similar_usersThis should be called after fitting or loading the recommender.
Get ids
recommender.user_ids
recommender.item_idsGet the global mean
recommender.global_meanGet factors
recommender.user_factors
recommender.item_factorsGet factors for specific users and items
recommender.user_factors(user_id)
recommender.item_factors(item_id)Thanks to:
- LIBMF for providing high performance matrix factorization
- Implicit for serving as an initial reference for user and item similarity
- @dasch for the gem name
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/disco.git
cd disco
bundle install
bundle exec rake test