TensorFlow Recommenders 是一个使用TensorFlow构建推荐系统模型的库。
它有助于构建推荐系统的完整工作流程:数据准备、模型制定、培训、评估和部署。
它基于 Keras 构建,旨在拥有平缓的学习曲线,同时仍为您提供构建复杂模型的灵活性。
确保您已安装 TensorFlow 2.x,并从以下位置安装pip:
pip install tensorflow-recommenders
为 Movielens 100K 数据集构建分解模型非常简单(Colab):
from typing import Dict, Textimport tensorflow as tf import tensorflow_datasets as tfds import tensorflow_recommenders as tfrs
# Ratings data. ratings = tfds.load('movielens/100k-ratings', split="train") # Features of all the available movies. movies = tfds.load('movielens/100k-movies', split="train")
# Select the basic features. ratings = ratings.map(lambda x: { "movie_id": tf.strings.to_number(x["movie_id"]), "user_id": tf.strings.to_number(x["user_id"]) }) movies = movies.map(lambda x: tf.strings.to_number(x["movie_id"]))
# Build a model. class Model(tfrs.Model):
def init(self): super().init()
<span class="pl-c"># Set up user representation.</span> <span class="pl-s1">self</span>.<span class="pl-s1">user_model</span> <span class="pl-c1">=</span> <span class="pl-s1">tf</span>.<span class="pl-s1">keras</span>.<span class="pl-s1">layers</span>.<span class="pl-v">Embedding</span>( <span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">2000</span>, <span class="pl-s1">output_dim</span><span class="pl-c1">=</span><span class="pl-c1">64</span>) <span class="pl-c"># Set up movie representation.</span> <span class="pl-s1">self</span>.<span class="pl-s1">item_model</span> <span class="pl-c1">=</span> <span class="pl-s1">tf</span>.<span class="pl-s1">keras</span>.<span class="pl-s1">layers</span>.<span class="pl-v">Embedding</span>( <span class="pl-s1">input_dim</span><span class="pl-c1">=</span><span class="pl-c1">2000</span>, <span class="pl-s1">output_dim</span><span class="pl-c1">=</span><span class="pl-c1">64</span>) <span class="pl-c"># Set up a retrieval task and evaluation metrics over the</span> <span class="pl-c"># entire dataset of candidates.</span> <span class="pl-s1">self</span>.<span class="pl-s1">task</span> <span class="pl-c1">=</span> <span class="pl-s1">tfrs</span>.<span class="pl-s1">tasks</span>.<span class="pl-v">Retrieval</span>( <span class="pl-s1">metrics</span><span class="pl-c1">=</span><span class="pl-s1">tfrs</span>.<span class="pl-s1">metrics</span>.<span class="pl-v">FactorizedTopK</span>( <span class="pl-s1">candidates</span><span class="pl-c1">=</span><span class="pl-s1">movies</span>.<span class="pl-en">batch</span>(<span class="pl-c1">128</span>).<span class="pl-en">map</span>(<span class="pl-s1">self</span>.<span class="pl-s1">item_model</span>) ) )def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
<span class="pl-s1">user_embeddings</span> <span class="pl-c1">=</span> <span class="pl-s1">self</span>.<span class="pl-en">user_model</span>(<span class="pl-s1">features</span>[<span class="pl-s">"user_id"</span>]) <span class="pl-s1">movie_embeddings</span> <span class="pl-c1">=</span> <span class="pl-s1">self</span>.<span class="pl-en">item_model</span>(<span class="pl-s1">features</span>[<span class="pl-s">"movie_id"</span>]) <span class="pl-k">return</span> <span class="pl-s1">self</span>.<span class="pl-en">task</span>(<span class="pl-s1">user_embeddings</span>, <span class="pl-s1">movie_embeddings</span>)model = Model() model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))
# Randomly shuffle data and split between train and test. tf.random.set_seed(42) shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)
train = shuffled.take(80_000) test = shuffled.skip(80_000).take(20_000)
# Train. model.fit(train.batch(4096), epochs=5)
# Evaluate. model.evaluate(test.batch(4096), return_dict=True)